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Preface 


This book is based on a series of lectures given over recent years in Master's courses 
in probability. It provides a short, self-contained introduction to the ergodic theory 
of Markov chains on metric spaces. 

Although primarily intended for graduate and postgraduate students, certain 
chapters (e.g., one and two) can be taught at the undergraduate level. Others 
(e.g., four and five) can be used as complements to courses in measure or 
ergodic theory. Basic knowledge in probability, measure theory, and calculus is 
recommended. A certain familiarity with discrete-time martingales is also useful, 
but the few results from martingale theory used in this book are all recalled in the 
appendix. Each chapter contains several exercises ranging from simple applications 
of the theory to more advanced developments and examples. 

Whether in physics, engineering, biology, ecology, economics, or elsewhere, 
Markov chains are frequently used to describe the random evolution of complex 
systems. The understanding and analysis of these systems require, first of all, a 
good command of the mathematical techniques that allow to explain the long-term 
behavior of a general Markov chain living on a (reasonable) metric space. Presenting 
these techniques is, briefly put, our main objective. Questions that are central to this 
book and that will be recurrently visited are: under which conditions does such a 
chain have an invariant probability measure? If such a measure exists, is it unique? 
Does the empirical occupation measure of the chain converge? Does the law of the 
chain converge, and if so, in which sense and at which rate? 

There are a variety of tools to address these questions. Some rely on purely 
measure-theoretic concepts that are natural generalizations of the ones developed 
for countable chains (i.e., chains living on countable state spaces). This includes 
notions of irreducibility, recurrence (in the sense of Harris), petite and small sets, 
etc. Other tools assume topological properties of the chain such as the strong 
Feller or asymptotic strong Feller property (in the sense of Hairer and Mattingly). 
However, when dealing with a specific model, measure-theoretic conditions—such 
as irreducibility—might be difficult to verify, and strong topological properties— 
such as the strong Feller condition—are seldom satisfied. A powerful approach 
is then to combine much weaker topological conditions—such as the (weak) 
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Feller condition—with controllability properties of the system to prove that certain 
measure-theoretic conditions (e.g., irreducibility, existence of petite or small sets) 
are satisfied. This approach is largely developed here and is a key feature of this 
book. 

The book is organized in eight chapters and a short appendix. Chapter 1 briefly 
defines Markov chains and kernels and gives their very first properties, the Markov 
and strong Markov properties. The end of the chapter gives a concise introduction 
to Markov chains in continuous time, also called Markov processes, as they appear 
in many examples throughout the book. 

Chapter 2 is a self-contained mini course on countable Markov chains. Classical 
notions of recurrence (positive and null) and transience are introduced. These are 
powerful notions, but when students meet them for the first time and have to verify 
that a specific chain is either recurrent or transient, they are often disoriented. 
Thus, we have chosen to spend some time here to show how these properties can 
be verified “in practice" with the help of suitable Lyapunov functions. We also 
explain how Lyapunov functions can be used to provide estimates on the moments 
(polynomial and exponential) of hitting times for a point or a finite set. 

Certainly one of the most important results in the theory of countable chains is 
the ergodic theorem, which asserts that—for positive recurrent aperiodic chains— 
the law of the chain converges to a unique distribution. The final three sections of 
Chap. 2 are organized around this result. We first prove it quickly—by standard 
coupling—without any estimate on the rate of convergence. Then, the Lyapunov 
method is applied to investigate the behavior of renewal processes and provide short 
proofs of coupling theorems for these processes. Finally, relying on these coupling 
results, we revisit the ergodic theorem, this time with some convergence rates. 

On uncountable state spaces, the simplest (and also the most natural) examples 
of Markov chains are given by random dynamical systems (also called random 
iterative systems). These are systems such that the state variable at time n + 1 is a 
deterministic function of the state variable at time n and a “random” input sampled 
from a sequence of i.i.d. random variables. Chapter 3 is devoted to this type of 
chain and explains how any given "abstract" Markov chain can be represented by 
a random dynamical system. Some interesting examples (Bernoulli convolutions, 
Propp-Wilson algorithm) are presented in exercises. 

Chapter 4 starts with a detailed section on weak convergence, tightness, and 
Prohorov's theorem. Then, invariant probability measures are defined, and it is 
shown that, for a Feller chain, weak limit points for the family of empirical 
occupation measures are almost surely invariant probability measures. We discuss 
some practical tightness criteria (for the empirical occupation measures) based on 
Lyapunov functions. At this stage of the book, the reader understands that, under 
a reasonable control of the chain at infinity (obtained for instance by a Lyapunov 
function), uniqueness of the invariant probability measure equates stability: the 
empirical occupation measures converge almost surely to some (unique) distribu- 
tion, regardless of the initial distribution. So we found it was a good place to discuss 
simple examples of uniquely ergodic chains (i.e., chains having a unique invariant 
probability measure). This is done in the third section of Chap. 4, where we analyze 
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random dynamical systems obtained by random composition of contractions (or 
mappings that contract on average). The penultimate section of the chapter is 
devoted to ergodic theorems. We first prove several classical results (Poincaré 
recurrence theorem, Birkhoff ergodic theorem, and the ergodic decomposition 
theorem) and then show how they can be applied to Markov chains. Finally, we 
discuss invariant measures of continuous-time processes and explain how their 
properties (existence, ergodicity, uniqueness, ergodic decomposition, etc.) can be 
studied using discrete-time theory. 

Chapter 5 is devoted to various notions of irreducibility which ensure unique 
ergodicity. We start with the measure-theoretic notion of irreducibility (also called 
w irreducibility) and then move on to more topological conditions. The accessible 
set of a Feller chain is introduced, and its relations with the support of invariant 
probability measures are investigated. We then consider strong Feller chains and 
prove that for such chains ergodic probability measures have disjoint support. We 
also prove the Hairer-Mattingly theorem, which says that the same property holds 
under the weaker assumption that the chain is asymptotically strong Feller. These 
results have the useful consequence that, on a connected set, if there is an invariant 
probability measure having full support, the chain is uniquely ergodic. 

We then discuss in Chap. 6 the notions of petite sets, small sets and (weak) 
Doeblin points and show that the existence of an accessible weak Doeblin point 
implies irreducibility for (weak) Feller chains. This latter result is then applied to 
a variety of examples both in discrete time (random dynamical systems, random 
dynamical systems obtained by random switching between deterministic flows) 
and in continuous time (piecewise deterministic Markov processes, stochastic 
differential equations). This gives us the opportunity to show how the accessibility 
condition is naturally expressed as a control problem and how the Doeblin properties 
are naturally related to Hórmander type conditions (for random switching models, 
piecewise deterministic Markov processes, and SDEs). 

Chapter 7 introduces Harris recurrence. For uniquely ergodic chains, Harris 
recurrence equates to positive recurrence, meaning that for every bounded Borel 
(and not merely for every continuous) function, the Birkhoff averages of the function 
converge almost surely. We prove the important result that Harris recurrence 
(respectively positive recurrence) is implied by the existence of a recurrent petite 
set (respectively a petite set whose first return time is bounded in L!). We also 
discuss simple useful criteria (relying on Lyapunov functions) ensuring that a set is 
recurrent and provide moment estimates on the return times. 

Chapter 8 revolves around the celebrated Harris ergodic theorem. After revisiting 
the notions of total variation distance and coupling for two probability measures, 
we state a simple version of the Harris ergodic theorem where the entire state space 
is a petite set. Under this strong hypothesis, one has exponential convergence in total 
variation distance to the unique invariant probability measure. The same conclusion 
holds under the existence of a Lyapunov function that forces the Markov chain to 
enter a certain small set—a condition that is better adapted to noncompact state 
spaces, which are usually not petite. We give two different proofs for this latter 
version of Harris's ergodic theorem: first the recent proof by Hairer and Mattingly 
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based on the ingenious construction of a semi-norm for which the Markov operator 
is a contraction. And second, a more classical proof using coupling arguments and 
ideas from renewal theory. More precisely, under uniform estimates on polynomial 
(respectively exponential) moments for the return times to an aperiodic and recur- 
rent small set, we obtain polynomial (respectively exponential) convergence in total 
variation distance to the unique invariant probability measure. Finally, we present a 
condition, also due to Hairer and Mattingly, that yields exponential convergence to 
the unique invariant probability measure in a certain Wasserstein distance. 

The appendix recalls the monotone class theorem and the few results from 
discrete-time martingales that are used in the book. 

More advanced textbooks include the excellent classical books by Meyn and 
Tweedie [49] and Duflo [22] and the more recent book by Douc, Moulines, Priouret, 
and Soulier [20]. The lecture notes by Hairer [31] contain some similar material and 
are also highly recommended. 
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Preliminaries 


The general setting is the following. Throughout all this book, we let M denote 
a separable (there exists a countable dense subset) metric space with metric d 
(e.g., IR, R”) equipped with its Borel o-field B(M). We let B(M) (respectively 
Cp(M)) denote the set of real-valued bounded measurable (respectively bounded 
continuous) functions on M equipped with the norm 


Il.f loo :— sup | f(x)I. (1) 
xeM 


If u is a (nonnegative) measure on M and f € L! (uw) (or f > 0 measurable), we 
let 


uf = f(x) u(dx) 
M 


denote the integral of f with respect to u. The rest of the notation is introduced in 
the main body of the text. Please also refer to the list of symbols at the end of the 
book. 
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Chapter 1 ff) 
Markov Chains Geek for 


This chapter introduces the basic objects of the book: Markov kernels and Markov 
chains. The Chapman-Kolmogorov equation, which characterizes the evolution of 
the law of a Markov chain, as well as the Markov and strong Markov properties are 
established. The last section briefly defines continuous-time Markov processes. 


1.1 Markov Kernels 


A Markov kernel on M is a family of measures 
P = {P (x, )}xem 


such that 


(i) Forall x € M, P(x, ) : B(M) = [0, 1] is a probability measure; 
(ii) For all G € B(M), the mapping x € M > P(x, G) € R is measurable. 


The Markov kernel P acts on functions g € B(M) and measures (respectively 
probability measures) according to the formulae: 


Pex) = f PG. dns). a1) 
M 
uP(G) := f u(dx)P(x, G). (1.2) 
M 
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Remark 1.1 For all g € B(M), we have Pg € B(M) and ||Pgllo < \Ilglloo- 
Boundedness is immediate and measurability easily follows from the condition (ii) 
defining a Markov kernel (use for example the monotone class theorem from the 


appendix). 


Remark 1.2 The term Pg(x) can also be defined by (1.1) for measurable functions 
g : M — R that are nonnegative, but not necessarily bounded. For such g, Pg(x) 
is an element of [0, co]. This will play a role in the study of Lyapunov functions 
starting in Sect. 2.3. 


We let P" denote the operator recursively defined by p? g := g and pn gis 
P(P" g) for n € N. Or, equivalently, 


PP (x, -) := 8, and P”! (x, G) := f P” (x, dy)P(y, G) 
M 


for all n € N and for all G € B(M). Here and throughout these notes, N is the set 
of nonnegative integers (including 0). The set of positive integers (excluding 0) will 
be denoted by N*. 


Example 1.3 (Countable Space) Suppose M is countable. We can turn M into a 
separable (and complete) metric space by endowing it with the discrete metric 
d(x, y) = lyzy. The corresponding Borel o-field is the collection of all subsets 
of M. A Markov transition matrix on M is a map P : M x M — [0, 1] such that 


fay 1 


yeM 


for all x € M. This gives rise to a Markov kernel Q defined by 


Dix G) := Y Py) 


yeG 


for all G C M. Since there is a one-to-one correspondence between transition 
matrices and kernels on M, we shall identify P with Q and refer to it at times 
as a transition matrix and at times as a kernel. 


1.2 Markov Chains 


In order to define Markov chains, we first need to introduce the (classical) notions of 
filtration and adapted processes. Let (Q, F, P) be a probability space. A filtration 
F = (Fn)n>0 is an increasing sequence of c -fields: Fy C £541 C F forall n € N. 
The data (Q, F, F, P) is called a filtered probability space. An M-valued adapted 
stochastic process on (Q, F, F, P) is a family (X;)n>0 of random variables defined 
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on (Q, F, P), taking values in M and such that X, is Fa —measurable for all n € 
N. If X = (Xn)n>0 is a family of random variables on (Q, F, P), the canonical 
filtration of X is the filtration FX = {FX},>9 where FX = o (Xo, ..., Xn) is the 
o-field generated by Xo, ..., Xn. With such a definition X is always an adapted 
stochastic process on (Q, F, FX, P). 

We can now define what a Markov chain is. Given a filtered probability space 
(Q2, .F, IF, P) and a Markov kernel P on M, a Markov chain with kernel P with 
respect to F is an M-valued adapted stochastic process (X5) on (Q, F, IF, P) such 
that 


P(Xn41 € G|Fn) = P(Xn, G) 
for all n € N and for all G € B(M). Equivalently, 
E(g(Xn+i)|Fn) = P8(Xn) 


for all n € N and for all g € B(M) (or all functions g : M — R that are measurable 
and nonnegative). Here, E(-|7,,) denotes conditional expectation with respect to Fy, 
and P(Xn41 € G|Fn) := E(.x,,;¢G|Fn). In the appendix, we recall the definition 
of conditional expectation and list some of its basic properties, which will be used 
without further comment throughout the text. 


Proposition 1.4 Let (X,) be a Markov chain with kernel P with respect to F. Then 
(Xn) is always a Markov chain with kernel P with respect to F* . This latter property 
is equivalent to 


E(g(Xn+1)ho(Xo) TEE (X4)) = E(Pg(X,)ho(Xo) t hn(Xn)) 


for all n € N, ho, ..., h, € B(M), and g € B(M). 


Proof Suppose that (X,) is a Markov chain with kernel P with respect to IF. Since 
Fe C Fn 


E(g(Xnsi Fn) = E(E(g(Xn41)|Fn)|Fy) = Pg(Xn). 


This proves the first statement. Multiplying the left-hand side and right-hand side 
of this equality by ho(Xo) ... ^; (X4) and taking expected value shows the forward 
implication of the second statement. The backward implication follows from the 
definition of conditional expectation. o 


Remark 1.5 In view of Proposition 1.4, when we say that (X5) is a Markov chain 
with kernel P, we implicitly mean that it is a Markov chain with respect to F*. 
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Given a Markov kernel P and a probability measure v on M, there always exists 
a Markov chain (X5) with kernel P and such that Xo has law v. As outlined in 
Proposition 1.8 (ii), this follows from the Ionescu-Tulcea theorem. 


Proposition 1.6 (Chapman-Kolmogorov Equation) Let (X,) be a Markov chain 
with kernel P. Let un denote the law of Xn. Then, for every n € N, 
Un+1 = An P = go Pt. 


Proof For every g € B(M), 


Mn+18 = E(g(Xn41)) = E(E(g(Xn41)|Fn)) = ECPg(XG)) = Un Pg. 
oO 


Example 1.7 (Countable Space) Let (Xn) be a Markov chain on a countable state 
space M, with transition matrix P and initial distribution zo. The law un of the 
random variable X,, then satisfies 


Aux) = D> uo(yDP"Q. x), Vx e M. 
yeM 


where P" is the nth power of the matrix P. In matrix-vector notation, this identity 
can be written as 


Un = no P", 


where un and uo are row vectors. In particular, if uo is the Dirac measure at a point 
y € M, then the law of X, assigns mass P" (y, x) to every singleton {x}, i.e., 


P(Xn = x|Xo = y) = P"(y, x). 


Feller and Strong Feller Chains 

The Markov kernel P (or the associated Markov chain (X,,)) is said to be Feller if 
it takes bounded continuous functions into bounded continuous functions. It is said 
to be strong Feller if it takes bounded Borel functions into bounded continuous 
functions. If M is countable and equipped with the discrete metric, then every 
function on M is continuous. In particular, every Markov kernel on a countable 
set is strong Feller. 
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1.3 The Canonical Chain 


Let X = (Xn)n>0 be a Markov chain with kernel P. Then X can be seen as a 
random variable on (Q, F, P) taking values in the space of trajectories 


MN := (x = (xi)ieN :xi € M} 
equipped with the product o -field 5(M )®N (see Exercise 1.9). 
If Xo has law v, we let P, denote the law of X, which is the image measure of P 


by X. In particular, for all Borel sets Ag,..., Ax C M, 


P(Xo € Ao, ..., Xx € Ap) = Py{x e MN: (xg,...,. x) € Ao x ... X Ay]. 
(1.3) 


We let E, denote the corresponding expectation. If v is the Dirac measure at x, we 
use the standard notation P, :— Ps, and E, := Es,. 


Proposition 1.8 
(i) Let X = (Xn)n>0 be a Markov chain with kernel P and initial distribution v. 
Then, for all Borel sets Ag, ..., Ax C M, 


P,(x € MN: (xo,..., xj) € Ao x ...x Ag] = 


f ie f Peo. dx)... f PU dx). (1.4) 
Ao A1 Ak 


(i) Let Q = MN and let F = B(M)SN, Given a probability measure v and a 
Markov kernel P on M, there exists a unique probability measure P, on (Q, F) 
characterized by (1.4). On (Q, F), the process (Xn)n>0 defined by Xn (X) = Xn, 
is a Markov chain with kernel P and initial law v, called the canonical chain. 


Proof Given k € N and ho, ..., hy € B(M), we let ho & ... & hg denote the map 
on MN defined as 


ho Q... Q h(x) := ho(xo) . . hx). 


For further reference such a map will be called a product map of length k + 1. Then 


E(ho(Xo) ...hx(X4)) = Er (ho 8 ...& hi) 
= E (ho 8 ...Q hy 4 Phy) = v[hoP [h PL... he_-1 Phy]... .]]. (1.5) 


The first equality is by definition of E,. The second equality follows from 
Proposition 1.4 and the last one follows from the second one by induction on k. 
This proves the first statement. 
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The existence of a unique probability measure P, on (Q, F) characterized 
by (1.4) is the celebrated Ionescu-Tulcea theorem (see, e.g., Theorem 2 in Chapter 
IL9 of [63]). Using the result from Exercise 1.9, it is not hard to check that 
the canonical process (X,) is a Markov chain on the filtered probability space 
(Q, F, FX, P,), with initial distribution v and kernel P. oO 


Exercise 1.9 Let 5(M") (respectively B(M N)) denote the Borel o-field over M" 
(respectively MN, endowed with the product topology). Let B(M)*" (respectively 
B(M)®\) denote the product o-field over M" (respectively MN). Show that 
B(M)*" = B(M”) and B(M)SN = B(MN). 

Hint: For the inclusion C one can use the fact that the projection z; : MN > 
M,x +> xj is continuous, hence measurable. Observe that this doesn't require 
the separability of M. For the converse implication, one can first show, using 
separability, that every open subset of M" is a countable union of product sets 
O; x... X On with O; open. 


1.4 Markov and Strong Markov Properties 


For n € N, we let ©” : MN — MN denote the shift operator defined by 
O” (x) :— (Xn+k)k>0. 


The following proposition known as the Markov property easily follows from the 
definitions. 


Proposition 1.10 (Markov Property) Let H : MN — R be a nonnegative or 
bounded measurable function and X a Markov chain with kernel P. Then 


E(H(0" o X)|Fn) = Ex, (H). 


Proof Assume without loss of generality that H is bounded. Indeed, if H is 
nonnegative and unbounded, there is an increasing sequence of bounded nonneg- 
ative functions that converges pointwise to H, and one can apply the monotone 
convergence theorem. The set of bounded H satisfying the required property is a 
vector space, containing the constant functions and closed under bounded monotone 
convergence. Therefore, by the monotone class theorem (given in the appendix) and 
by Exercise 1.9, it suffices to check the property when H = ho @... @ hy isa 
product map. We proceed by induction on k. If k = O, this is immediate. If the 
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property holds for all product maps of length k + 1, then 
E(ho(Xn) . hg sai Xn+k+1)|Fn) 


= E(ho(Xn) .. Ak (Xs e) EC (Xn+k+1)| Fn Fn) 


= E(ho(X4) .. hx CXy 44) Pha (Xn) Fn) = Ex, (ho 8 ... Q hy Phy). 


By (1.5), this last term equals Ey, (ho & ... & hi1). o 


A stopping time on a filtered probability space (Q, F, F, P) is a random variable 
T : Q —> NU {oo} such that for all n € N, the event {T = n} = T7! ({n} lies in 
Fn. The o -field generated by T , denoted Fr, is the o -field consisting of all events 
A € F such that 


AN{T =n}EFn, WVneN. 


Exercise 1.11 


(i) Show that Fr is indeed a o-field. 

(ii) Let (T,)nen be a sequence of stopping times on a filtered probability space 
(Q,F,F, P) such that T, < T+: for every n € N. Show that A, :— Fr,, 
n € N, defines a filtration on (Q, F, P). 


The following proposition generalizes Proposition 1.10. 


Proposition 1.12 (Strong Markov Property) Let H : M — R bea nonnegative 
or bounded measurable function, X a Markov chain, and T a stopping time living 
on the same filtered probability space as X. Then 


E(H (0T o X)|Fr)0 cco = Ex, (HT<. 


Proof It suffices to show that for all n € N, 


E(H (O" o X)1 r2, |.Fr) = Ex, CH)1 r2. 


The right-hand side is Fr-measurable, and for all A € Fr, 


E(T(O" o X)1 r2;14) = E(Ex, (H)1r25414) 


by the Markov property (because 17—, 14 is ,-measurable). This proves the result. 
o 
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1.5 Continuous Time: Markov Processes 


Although this book is about Markov chains in discrete time, it is useful to say a 
few words about Markov chains in continuous time, also called Markov processes, 
because they appear in many examples throughout the book. The definitions are 
modeled on discrete time. A Markov semigroup on M is a family ( P;};>0 of Markov 
kernels on M such that 


(i) Po(x, ) = ôx; 
(ii) For all G € B(M), the mapping (t, x) — P;(x, G) is measurable; 
(iii) Forallt, s > 0, Pras = P, o Ps. 


Let (2, F, P) be a probability space and let F = (F;),;>0 be a continuous-time 
filtration, i.e., a family of o-fields such that 7; C F, C F forallO < s x t. 
An M-valued adapted stochastic process on (Q, F, F, P) is a family (X,);>0 of 
random variables defined on (Q, F, P), taking values in M and such that X, is F;- 
measurable for all t > 0. 

A Markov process with semigroup {P;};>9 with respect to F is an adapted 
stochastic process X = (X;);>0 on (Q, F, IF, P) such that for all g € B(M) and 
t,s > 0, 


E(g(Xi4s)|Fi) = (Psg)(Xr). 


Exercise 1.13 Suppose M is countable. Let (Y;) be a Markov chain on M with 
kernel P. Let U1, U5,... be a sequence of independent identically distributed 
random variables on (0, oo) having an exponential distribution of parameter A, i.e., 
P(U; > t) = e". Set To = 0 and T, = U1 +... + Un forn > 1. Let (Xj)i»0 
be the continuous-time process defined by X; = Y, for T, < t < T,41. Show that 
(X+) is a Markov process with semigroup 


(Qt) p* 


— 4—Àt MP , zm 
P,=e “e =e H 


k>0 


Feller Processes 
We use the following terminology. We say that the Markov semigroup {P;}:>0 is 
weak Feller provided that 


(i) P;(Cy(M)) C Co (MD for all t > 0; 
Gi) For all f € C,(M) and x € M, limo P, f G) = f). 
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This definition implies that P, is Feller for all £ > 0. Observe however that it is 
weaker than the usual definition of a Feller semigroup (see, e.g., [26, 59] or [45], 
which assumes that 


(i) M is a locally compact metric space; 
(ii) (P;];»0 is a strongly continuous semigroup on Co(M) (the set of continuous 
functions vanishing at infinity), meaning that 


(a) P;(Co(M)) C Co(M); 
(b) For all f € Co(M), lim;,o || P; f — fllo = 0. 


Remark 1.14 Itis proved in [59, Proposition 2.4] that [(a), (b)] above is equivalent 
to [(a), (b)'] where (b)’ is given by the (seemingly) weaker condition that 


m P, f (x) = f(x) 


forall f € Co(M) and x € M. As shown by the following exercise, this equivalence 
does not hold if Co (M) is replaced by C5 (M). 


Exercise 1.15 Let M = (0, co), and let P, be defined on B(M) as 


xe! 
Pf (x) = (3) 


Show that ( P;};>0 is a weak Feller Markov semigroup which is not Feller. 


Chapter 2 ff) 
Countable Markov Chains Geek for 


This chapter presents the basic theory of countable Markov chains. The assumption 
that M is countable makes the proofs easier and permits to introduce, in a 
simple setting, some of the key notions (such as invariant probability measures, 
irreducibility, positive recurrence, etc.) that will be revisited in the subsequent 
chapters. Furthermore, some of the results given here, in particular in Sect. 2.6, 
will be used later to prove the main results in Chap. 7. We assume here that M 
is a countable set equipped with the o-field S of all subsets of M, and (X5) is 
a Markov chain on M with Markov kernel (or matrix) P = P(x, y)x,yem. In 
most of this chapter, we assume without loss of generality that Q = M, F = 
SON Xn(@) = wy, and Fn = o(Xo,..., Xn), i.e., (Xn) is the canonical chain 
introduced in Sect. 1.3. 


2. Recurrence and Transience 


For x € M, we let 
Ty := dnfík > 1: Xk =x} 
denote the first time > 1 at which the chain hits x, 
1? := inf{k > cU D : Xx, = x}, 


the n^ time of hitting x (with c := 0), and 


Ny = lix, € NU {00} 
k>1 
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the number of visits of x at or after time 1. We adopt the convention that inf Ø = 
+oo. A point x is said to be recurrent if 


Py(t < œ) = 1 


and transient otherwise. 

Given x, y € M and k € N*, we say that x leads to y in k steps, written x ~»* y, 
if P^(x, y) > 0. We say that x leads to y, written x ^» y, if x ~f y for some 
k € N*. The chain is called irreducible if x ~> y for all x, y € M. To any Markov 
chain on a countable set M with transition matrix P, one can associate a weighted 
directed graph as follows: Let M be the set of vertices. For any x, y € M, not 
necessarily distinct, there is a directed edge of weight P(x, y) going from x to y if 
and only if P(x, y) > 0. The chain is then irreducible if and only if the associated 
directed graph is connected, i.e., for any x, y € M there is a path from vertex x to 
vertex y that moves along directed edges. Note that a general notion of irreducibility 
will be defined in Chap. 5 and that every countable irreducible chain (as defined 
here) satisfies this general definition. 


Proposition 2.1 
(i) Ifx is transient, then Ny < œ a.s. and for all k > 0, 


P, (Nx =k) = a*(1 — a), 


where a = P, (tx < oo). In particular, 


a 


< OQ. 


S (Ne) = X Pe x) = 


k>1 


l-a 


(ii) If x is recurrent, then P(N; = oo) = 1, 


(N= >) Pe x) = œ, 


k>1 


and 


[s 1 
ia Dy Wm = Eq 


Py-a.s. 
(iii) If the chain is irreducible, then either all points are recurrent or all points are 
transient. In the recurrent case, for all x, y € M, 


Py (ty < œ) = Land E, (N,) = oo. 


2.1 Recurrence and Transience 


In the transient case, for all x, y € M, 


D (Ny) < oo. 


Proof 
(i) Using the strong Markov property, 
P, (Ny = k) = P, (t < oo; ft = 00) = (1  a)P, (v9 < oo) 
and 
P, (r9 < oo) — aP, (cA <o)=...= a*. 


(ii) If x is recurrent, then, using again the strong Markov property, 


P(r < œ) = P, (r®” <o)=...=1. 


Hence P(N, = oo) = 1 and thus IE, (Nx) = oo. 
For all n > 1, there exists k(n) > O such that rÉ <n< T 
Furthermore, the random variables ger) 


variables, 


be k(n) 1 
lim — lix,2; = lim ——— = — i 
noon 3 Uc n— oo rk) x (Tx) 


(iii) If the chain is irreducible, for all x,y € M there exist i,j > landde > 0 
such that P' (x, y) > e, Pİ (y, x) > e. Thus PAt'+/ (x, x) > e? P¥ (y, y) for 


all k > 1. Therefore, we have the implication 


2.P 6:3)906. S| LY Peale a. 


k>1 k>1 


proving that x is recurrent whenever y is recurrent and y is transient whenever 


x is transient. 


Suppose the chain is recurrent. Fix x, y € M such that x ZZ y (for x = y the 
statement holds trivially true). By irreducibility, recurrence, and the strong Markov 


property, 


e := P (Ek < ty : Xk = y) > 0. 


13 


(k(n)+1) 


— 109,0 are, under Py, i.i.d. 
Thus, by the strong law of large numbers for nonnegative i.i.d. random 
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Thus, again using the strong Markov property, 


Py (Ty > cf *P) 2E, (P. (ty > t * PLE e) 


=E ((1 — Px Gk < Tx : Xp = y», L0) 


Ty > Ty 


=(1 — &)P,(z, > 109) =...= (1 - at. 


Thus P, (ty > t®+D) — O0 as n — oo, showing that P, (ty < oo) = 1. The two 
statements about E, (N,) follow from the identity 


Ex (Ny) = IP. (Ty < oo)y(1 + E;(N,)), 


which itself follows from the strong Markov property, and is valid for both recurrent 
and transient chains. o 


Remark 2.2 Transience does not imply that Py (ty < oo) < 1 for all x, y. Consider 
the chain on N whose transition matrix is given by 


P(x, x+1)= p € G, 1), P@+1,x) = 1— p forall x € N and P (0, 0) = 1 — p. 


By the strong law of large numbers, IP. (ty < oo) = 1 for all x < y and the chain is 
transient. 


Example 2.3 (Pólya Walks) ^ The Polya walk on Z^ is the Markov chain with 
transition matrix 


1 
P(x, y) = a4 lt 


where x ~ y & 3 |x; — yil = 1. In 1921, Pólya proved that the associated 
chain is recurrent for d < 2 and transient for d > 3. 
The proof for d — 1 goes as follows. Clearly 


1 (2k 


Stirling’s formula ( In(n!) = n(In(n) — 1) + 5(In(n) +1In(2z7)) + o(}) ) then yields 


P?*(0, 0) ~ 


Ink 


This proves that 5 ^, Pk (0, 0) = oo, hence the recurrence. 
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For d = 2, recurrence can be deduced from Exercise 2.4 below. The proof 
of transience for d — 3 is slightly more involved and can be found in classical 
textbooks (see, e.g., [7] or Woess's book [70] for a more advanced textbook on 
Markov chains on graphs and groups). 


Exercise 2.4 (Pólya Walks) Let X, = (X]l,..., X7), where the (X}),i = 
1,...,d are independent Pólya walks on Z. Show that (X,) is recurrent if and only 
if d < 2. Deduce from this result the recurrence of the Polya walk on Z2. 


Exercise 2.5 (Generating Functions) Let 0 < p < 1 andq = 1— p. Consider the 
biased walk on Z whose transition matrix is given by P(x, x+1) = p, P(x, x 1) = 
q and P(x, y) = 0 for |x — y| Z 1. 

For all 0 < t < 1 and y € Z, set 


Uy(t) = io (t” ls <oo}) 


and 


Gy(t) = «(X Int) =F Or. 
k 


k>0 


(i) Prove the following identities: 


Uo(t) = t(pU-1 (t) + qUi(t), U1 (t) = t(p + qU_2(t)), U-1 (t) = t(q + pU-2(t)), 


U(t) = U? (t), U-2(t) = U?,(0, 


(ii) Compute UG, Ga " and show that 


: bx C ) 1 l i 
=m; Tx|Tx < œ) = A EE 3 
|1 —2p| en 2 max(p, q) 


Comment on these results. 


ix (Nx) 


2.1.1 Positive Recurrence 


A recurrent point x is called positive recurrent if Ex(v,) < oo and null recurrent 
otherwise. 
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A measure (respectively a probability measure) x on M is called invariant for a 
transition matrix P if zt P = P, or equivalently, 


(x)= 3 nQ)PG. x) 


yeM 


for all x € M. Here, we write x(x) instead of z((x]) to highlight the link 
with matrix-vector notation. Precisely, if M = {1,...,N} or M = N%, and if 
x € M, then z (x) is the xth entry of the row vector (x ({1}), x ((2)). ..., x ((N))) 
or (7 ({1}), w({2}), ...). If x is an invariant probability measure for P and if XQ is 
distributed according to x, then X, is distributed according to zr for all n > 1 by 
Proposition 1.6. 

The next result shows that for an irreducible recurrent kernel, either all points are 
positive recurrent or all points are null recurrent. Moreover, positive recurrence is 
equivalent to the existence of an invariant probability measure. 


Theorem 2.6 Suppose P is irreducible. Then the following assertions are equiva- 
lent: 


(a) There exists an invariant probability measure x for P; 
(b) There exists a positive recurrent point. 


Under these equivalent conditions: 


(i) All the points are positive recurrent; 
(ü) For every initial probability distribution v on M and x € M, 


1 


n 

. 1 

lim — ) lix,-x] = m(x) = 
k=1 


Ly (Tx) 


noon 


P,-a.s. (in particular, xt is unique); 
(iii) For all x € M and f : M > R bounded or f : M — [0,06], 


np = Eco. £O). 


Ly (Tx) 


(iv) For all x, y € M, Ey(t,) < oc. 


Proof For all x € M, yy Ux,axy = lpr, <co} bom lix,—xj. Then, using 
irreducibility and Proposition 2.1 (ii), one has for every probability measure v on 
M 


5rd 1 
lim a) [Xxx] = ene (2.1) 
n3% n n) 


P,,-a.s., with the convention that the right-hand term is zero if x is transient. Suppose 
now that x is an invariant probability measure. By irreducibility and the relation 
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U(x) = D, 70) PO, x), one sees that m(x) > 0 for all x € M. Taking Er- 
expectation on both sides of (2.1) and using dominated convergence gives 


Px(t < oo) 


0 « z(x) — LG 


This implies E,.(t,) < oo so that x is positive recurrent. By Proposition 2.1 (iii), 
recurrence implies P; (v, < oo) = 1. Thus z (x) = s Suppose now that there 
exists a positive recurrent point x. Let x be the probability measure defined as in 
assertion (iii) of Theorem 2.6. We claim that z is an invariant probability measure 
(compare with Exercise 4.24). For all f € B(M), 


En) af = Ex (Y Inf) =E, (Y T) 


k-0 k>0 


because f(X;,) = f (x). Thus, using the Markov property and Fubini’s theorem, 


Ly) wf 23 Ex (ECS OG DNI) 


k>0 


By (Y tues PFC) = Ex (tx) (Pf). 


k>0 


This shows that x Pf = x f, hence x P = x. 
It remains to prove assertion (iv). Let x # y € M. By irreducibility one can 
choose k > 1 such that P* (x, y) > 0. Let ty y :— inf[(n > k : Xn = x}. Then 


(Kk) 


Tk,x € Tx and, consequently, 


k + Ex (Ex, ()elpi zig) = Ex (thx) < Ex (r) = 


Here the last equality follows from assertion (ii) and the strong Markov property. 
By the Markov property, 


By (Thx) = k + Ex (Ex, (oc zu) > k + P*(x, y)Ey (Tx). 


This shows that 


k(l — 1(x)) 
m (x) P*(x, y) 


by (Tx) < 


oO 


An irreducible kernel (or chain) satisfying one of the equivalent conditions (a) 
or (b) of Theorem 2.6 is called a positive recurrent kernel (chain). 
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Corollary 2.7 If M is finite and P is irreducible, then P is positive recurrent. 


Proof The set P(M) of probability measures on M is nothing but the unit simplex 
in R? with d the cardinality of M. By Brouwer's fixed point theorem (see, e.g., 
Corollary XVI.2.2 in [23]), the map P(M) > x œ> x P € P(M) has a fixed point, 
which is then an invariant probability measure for P. o 


Remark 2.8 The proof of Corollary 2.7 shows that every Markov chain on a finite 
set, possibly non-irreducible, always admits (at least) one invariant probability 
measure. 


Exercise 2.9 Give a direct proof of this latter fact. Hint: Consider the sequence 
(Uun) defined by un = i Ne 1 | P*, where jz is some probability measure. 


An interesting consequence of Theorem 2.6 (iii) is the next proposition, which 
relates moments of the first return time to x to zr-mean moments of the hitting time 
of x. 


Proposition 2.10 Suppose P is positive recurrent with invariant probability mea- 
sure 7t. Then, for every nonnegative function Y : N — R+ and every x € M, 


2.) = GE. (> ve). 
k=1 


In particular, for every X > 0, 
A 


e-l 


og (e^) = n(x) [Ex(e**) — 1]; 


And for every p > 0, 


[n + D^*11-1 
p+! ` 


Eq (TE) < n(x) 


Proof Fix Y : N — R, and x € M. By Theorem 2.6 (iii) applied to f(y) := 
Zy (Y (Tx)), one has 


in (W(Tx)) = 1 (E, (X iesi A = n(x) 3 E Qs s Ex, QA). 


k>0 kz0 


But, by the Markov property, 


Ex (dz, >k Ux, (W(Tx))) = Uy ( x (Y (Tx T kla >kl|Fk)) = ix (W (Tx nd k)1;, 5x). 


This proves the result. o 
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Exercise 2.11 (Pólya Walks, Continued) Show that the Pólya walks on Z and 
Z? are null recurrent. Hint: Show that they do not have any invariant probability 
measure. 


Exercise 2.12 (Reflected Walks) Let 0 < p < 1g =1-—pand0 <r < 
1. Consider the chain on N whose transition matrix is given by P(x,x 4- 1) — 
p, P(x,x— 1) =q if x > 1, P(0,0) = r and P(0, 1) = 1 — r. With the notation of 
Exercise 2.5 compute Uo(t) and show that the chain is transient for p > 1/2, null 
recurrent for p = 1/2 and positive recurrent for p < 1/2. Compute Eo(ro|to < oo). 


Exercise 2.13 (Harmonic Functions) A function h : M — R is called harmonic 
for the Markov kernel P if Ph = h. Suppose P is irreducible and recurrent. 
Show that every nonnegative or bounded harmonic function is constant. (Hint: 
Show that A(X,) is a nonnegative (or bounded) martingale, hence convergent by 
Theorem A.6.) Give an example of a nonconstant unbounded harmonic function for 
the Pólya walk on Z. 


Exercise 2.14 (Reversibility) Let z be a probability measure on M. A Markov 
kernel P is said to be reversible with respect to x if zt (x) P(x, y) = z(y)P(y, x) 
for all x, y € M. 


(i) Show that if P is reversible with respect to x, then z is invariant for P. 

(ii) Show that if P is reversible with respect to x and if w(x) > O0 for all x € 
M, then Pf (x) := Tow P (x, y) f Cy) defines a self-adjoint operator on the 
Hilbert space (1) := (f : M > R: YVyeyr@If@)/? < oo] with 
inner product (f, g) := 3 pey T) f (x) g(x), i.e., (Pf, g) = (f, Pg) for all 
fig € P (x). 

(iii) Give an example of a Markov kernel P and a probability measure z such that 
zt is invariant for P, but P is not reversible with respect to 7r. 


2.1.2 Null Recurrence 


Although an irreducible null recurrent chain has no invariant probability measure 
(for otherwise it would be positive recurrent) it always has an unbounded invariant 
measure. 


Theorem 2.15 Suppose P is irreducible and null recurrent. Given x € M, let x be 
the measure on M defined by 


Tx—1 
nf = «(X fæ) 
k=0 
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for f : M — R nonnegative. Then x is o-finite (1(y) < oo for all y € M), 
positive (1 (y) > 0 for all y € M), unbounded (1 (M) = oo), and invariant under 
P (x = x P). Every other o -finite invariant measure is proportional to 1. 


Proof For y # x, set Ny, = bam l(x,—y). By the strong Markov property, for 
all k > 0, 


Px (Ny <x >k+1)= Bae < Tx) = PG” < Ty; pere < Tx) 


= P, (pi^ < Ty)Py(ty < Tx) = a**!, 


where a = Py(Ty < Tx) < 1 (by irreducibility). This proves that 
a 
0 « m(y) = —— « oo. 
l-a 


Invariance of z is proved exactly as in Theorem 2.6 (iii). Clearly 7(M) = oo for 
otherwise — am would be an invariant probability measure, in contradiction with the 
assumption that the chain is null recurrent. 

It remains to show that every other o-finite invariant measure is proportional 


to u. Let Q(x, y) = ELLE Then Q is a Markov kernel and Q"(x, y) — 


ER It follows that Q is also irreducible and null recurrent by application of 
v(x) 


Proposition 2.1. Let now v be another o -finite invariant measure. Then h(x) = 1G) 
is harmonic for Q, hence constant (see Exercice 2.13). This concludes the proof. 


| 
2.2 Subsets of Recurrent Sets 
Given C C M, we let 
del) nn : 

tc = Tọ c infln > 1 : Xn € CH, 
and 

QM := inf{n > ae : Xn € C} 
for all k > 1. We also set r? := 0. The next proposition shows that, whenever P is 


irreducible, recurrence (respectively positive recurrence) of the chain is equivalent 
to recurrence (positive recurrence) of any finite subset. 
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Proposition 2.16 Suppose P is irreducible and let C C M be a nonempty finite set 
such that for all x € C, Py (tc < oo) = 1 (respectively Ex (tc) < oo). Then P is 
recurrent (respectively positive recurrent). 


Proof Let x € C. Then, since Py(t¢ < oo) = 1 for all y € C, the strong Markov 
property implies that (X5) visits C infinitely often P,.-almost surely. Since C is 
finite, it follows that P,-almost surely, there is y € C such that Ny = oo. If P 
was transient, we would have by Proposition 2.1 that Px(Uyec{Ny =œ} =0,a 
contradiction. Hence P is recurrent. 

Suppose now that K :— maxyec Ex(tc) < oo. Let Q be the Markov kernel on 
C defined by Q(x, y) := P, (Xtc = y) for x, y € C. Since C is finite, Q admits an 
invariant probability measure x (see Remark 2.8). Thus, if Xo has law x, then Xs. 
has also law zr. It follows (by a proof similar to the proof of Theorem 2.6 (iii) or by 
Exercise 4.24) that the measure u defined by 


af = EE Qe! £00) 
js (tc) 


is invariant for P. Note here that E,(tc) < K < oo. This proves positive 
recurrence. o 


Exercise 2.17 Suppose P is irreducible, C C M is finite and for all x € M \ C, 
Px(tc < oo) = 1. Show that P is recurrent. Hint: If M \ C Z Ø, prove that for all 
x € C, Px(tm\c < oo) = 1 and then use Proposition 2.16. 


The next result extends and generalizes Proposition 2.16. The second part contains 
a classical result originally due to Chung [16]. The proof given here is different. 


Proposition 2.18 Suppose P is irreducible and let C C M be a finite set. 


(i) Assume that for some A9 > 0 and all x € C, Ex (ec) < oo. Then, for all 
x,y € M, there exists X € (0, Xo] such that 


D. (e^) < oo. 


(ii) Let p > land suppose that for all x € C, E> (tE) < oo. Then, for all x, y € M, 
p pp C 


Dim) « oo. 


Proof 


(i) First assume that M — C. In this case there exists, by irreducibility, some 
€ > 0 such that for all x, y € M and k := card(M), P, (ty > k) € 1 — e. 
Therefore, by the Markov property and induction on n > 1, 


IP. (ty > nk) = Lx Ar, (n=) PX n-p (Ty >k) < (1—8). 
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Thus, for all n > 0, 


P, (ty > n) <Py(ty > k[2]) € (1 — FO, 


i 
where [7] is the largest integer less than or equal to 7. Hence, for a > 0 so 
small that et” (1 — £) < 1, 


oo 


ite) < y eo Pty > n) < o. 


n=1 


We now turn to the proof of the first statement in full generality. Let 
ae X. 
Such a definition makes sense because, by recurrence, vu « co almost surely. 
For all y € C, set oy :— inf(n > 1: Y, = y). For x € C, (Yn) is a C- 
valued Markov chain on the probability space (M, B(MP), Px), with respect 
to the filtration {Fw ns and with Markov kernel O(a, b) := Pa(X1. = b) 
c 
introduced in the proof of Proposition 2.16. Thus, by what precedes, 


max E,(e^?») < oo (2.2) 
x,yeC 


for some a > 0. 

By assumption, max,ec E,(e*C) < e for some ay > 0. By Jensen’s 
inequality, for all t € [0, 1], E,(e^»*c) < E,(e^v*c)! < e^». Choose A € 
(0, 22] so small that 2àœọ < Aga. Then 


max E, (e?^*c) < eë. 
xeC 


(n) 
Set M, := eO^'c —"® The previous inequality combined with the strong 
Markov property shows that (M; ) is a supermartingale under P, with respect 
to the filtration {Fw }n. Therefore, using Theorem A.4 on optional stopping, 
C 


(Mn ^a,) is again a supermartingale, and in particular Ey (Mn ^a, ) x Ex (Mo) = 
1. Together with Hólder's inequality, this yields for all x, y € C 


(n^oy) 


(ete )< i Mane Ux (e%0^0s))1/2 = ae) 1/2 < oo. 
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i) 


Thus, 


O 
2. (e^) = Ex (ec ) < oo 


forall x, y € C. 
In order to conclude the proof, it suffices to show that for any finite set C’ 
containing C, maxyec Ex (e^"'C) < oo. Then, by what precedes (with C’ in 
place of C), this will imply that max, yec’ Ex (e'%) < oo for some À' € 
(0, Ao]. 

We reason like in the proof of Theorem 2.6 (iv). Let C’ > C, y € C’\C. Fix 
x € C. Then, for some k > 1, P^ (x, y) > 0. Lett c = min(n > k: X, € C}. 
One has Tk c < ie Thus, 


e^ PE (x, y)E, (eh?) < E, (P Ey, (eC 1y, gc) = Ex (e™° 1x, gc) 


(k) 
<E,(e*"c ) < [max E,(e*0)]}* < oo. 
zeC 


This concludes the proof of (i). 
Slightly adapting the previous argument, one easily shows that 


max GP «oo => max ix (th) « oo 
xEC xec' 


for any finite set C' containing C. It then suffices to show that, for all x, y € 
C, Ex (ord ) < oo. 

By the assumption and the strong Markov property, there exists K > 0 such 
that for every n > 0, 


S +1 a 
hr — re PIF, w) = Er, GC) < K^. 


Therefore, with || - |p = Ex (| - |P)/^, 


(ay) 


"TF T" 
Lele = lte” lp = D ne Men < ie ee Nio, lp 
i>0 P i-0 

Now 
"pr gue 
bre cre Pico,) = Ex (Ex Ire) — ee PIF wo ico,) < K?Px(ay > i). 
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Thus 


lltyllp < K > Px (oy > i)'/? < oo, 


i>0 


because, as seen in the beginning of the proof, the law of cy has a geometric 
tail. 


2.3 Recurrence and Lyapunov Functions 


By Proposition 2.1, the divergence (respectively convergence) of the series 
Vet P* (x, x) is a criterion for the recurrence (transience) of the point x, but 
such a criterion may be difficult to verify in practice. We discuss here other criteria 
based on Lyapounov functions, a tool that will play a key role in the next chapters. 
In brief, a Lyapunov function is a map V : M — [1, œ) such that PV — V < 0 
outside a certain subset C C M. Lyapunov functions are practical tools to ensure 
that the assumptions of Propositions 2.16 and 2.18 are satisfied. 

A map V : M — R+ is called proper if for every R > 0, the set {x € M : 
V(x) x R} is finite. If M is finite, every map V : M — R+ is proper. If M is 
countably infinite and (x,)n>1 is any enumeration of the elements of M, V : M — 
R+ is proper if and only if limp+oo V (xn) = co. 

Apart from the first assertion, the following result is a consequence of a more 
general result (Proposition 7.12) that will be proved later. 


Theorem 2.19 Let P be a Markov kernel, let V : M — [1, œ) be a map, and let 
C C M be nonempty. Consider the following conditions: 


(a) P is irreducible, PV — V < 0 on \ C and V is proper; 
(b PV—V <-lonM\ Cand PV «ooonC; 
(b?) Condition (b) and in addition 


sup Ex (IV (X1) — V(x)|?) < oo 
xeM 


for some p = 1; 
(c) PV —V x —AVon M \ C for some à € (0, 1) and PV < œ on C. 
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Then, for all x € M, 


(i) Under condition (a), 
Py. (tc < œ) = l; 


(ii) Under condition (b), 


ix (tc) < PV(x) 1; 


(iii) Under condition (b’), 


Ex(té) < eC + VEP) 


for some constant c > 0 that depends on p but does not depend on x; 
(iv) Under condition (c), 


2, (CC) < E, (e- 8179€) < — evo). 


In particular, if P is irreducible and if C is finite, conditions (a), (b), (b^), (c) 
respectively ensure recurrence of P, positive recurrence of P, p-th moments for 
the hitting times ty under Px, and exponential moments for ty under Px for every 
x,y € M. 


Proof We only prove the first assertion. The other three follow from Proposi- 
tion 7.12 to be proved later. When P is irreducible and when C is finite, recurrence, 
positive recurrence, p-th moments, and exponential moments of hitting times are 
direct consequences of Propositions 2.16 and 2.18. 

By irreducibility, the chain is either recurrent or transient. If it is recurrent, 
Px(tc < oo) = 1 for every x € M by Proposition 2.1. Suppose the chain 
is transient. For x € M \ C, the sequence V, := V(Xņnarc) is under P, a 
supermartingale because Ey (Vn41 — V,|./;) = (PV(Xn) — V(X4,)1gs4 x 0. 
Thus, being nonnegative, (V„) converges P,.-almost surely to some random variable 
Veo taking values in [0, oo) (apply Theorem A.6 to the submartingale (—V,,)). 
This shows that V(X,) converges P,-almost surely on (rc = oo). On the other 
hand, by transience (Proposition 2.1 (iii)) and by the assumption that V is proper, 
lim sup, ,4, V(Xn) = oo P,-almost surely, and therefore Py (tc < oo) = 1. And 
for x € C, we have by the Markov property 


Px (tc < 00) = P4(X4 € C) + Ey A xyem\cPx, (vc < 00)) = 1. 
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Exercise 2.20 Suppose V : M — [1,00) is a proper map. Show that condition 
(c) in Theorem 2.19 for a nonempty finite set C is equivalent to the existence of 
constants 0 < p < 1 and x > 0 such that 


PV X pV +k. 


Show that under such a condition, every invariant probability measure zr satisfies 


mW ust <o. 
1—p 


See Corollary 4.23 for a proof of the second assertion. 


2.4 Aperiodic Chains 


We start with a general definition of aperiodicity. Let R C N* be a (nonempty) set 
closed under addition. That is 


ijeR-i-cjeR. 


The period of R is defined as its greatest common divisor. If this period is 1, R is 
said to be aperiodic. Aperiodic sets enjoy the following useful property, that will be 
used repeatedly throughout the book. 


Proposition 2.21 Let R be aperiodic. Then there exists ng € N such that ng + N = 
(neN:nono) CR. 


Proof There exist, by aperiodicity, a1, ..., a; € R whose greatest common divisor 
is 1. (To see this, take any element of R and call it a1; then a4 has a finite number 
of divisors strictly greater than 1, which we denote by d2,..., di; for2 <i < l, 
pick a; from R such that d; does not divide aj; such a; exists because the greatest 
common divisor of R is 1). By Bézout’s identity, there exist q1, ...q; € Z such that 
gai = —1. Set a :— 23s qiaj. The set R being closed under addition, 
both a and a + 1 = pare —diai lie in R. Every n > a? can be written as 
n = ka +r = (k — r)a 4t r(a + 1) for some r € {0,...,a—1} and k > a. 
Thus, every n > a? is an element of R. oO 


We now turn to the definition of aperiodicity for a countable Markov chain. Given 
a kernel P on M and x € M, let R(x) := {kK >1: x m x) be the set of possible 
return times to x. The period of x, per(x), is defined as the period of R(x) and x is 
called aperiodic whenever R(x) is. The kernel (or the chain) is said to be aperiodic 
if all points x € M are aperiodic. 
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Proposition 2.22 Suppose P is irreducible. Then 


(i) All points x € M have the same period; 
(ü) P is aperiodic if and only if for all x, y € M there exists n(x, y) € N such that 
x v" y for all n > n(x, y). 


Proof 


(i) Let x, y € M. By irreducibility, there exist i, j € N* such that x ^» y and 
y ^ x. Thus i +j € R(x) and for all k € R(y), i + j+k e R(x). 
Therefore, per (x) divides i + j and i + j + k, hence k, for all k € R(y). Thus 
per(x) < per(y) and by symmetry per (x) = per (y). 

(ii) The “if” part is obvious. We prove the “only if" part. Given y € M, there 
exists, by Proposition 2.21, no € N such that n € R(y) for all n > no. If now 
x is another point in M, x ^»! y for some i by irreducibility, hence x ~>” y 
for all n > no +i. 


oO 


An immediate useful consequence of Proposition 2.22 is the next result. Given two 
Markov kernels P and P respectively defined on the countable state space M and 
M, we let P & P denote the Markov kernel on M x M corresponding to two 
independent chains with kernels P, P. That is 


(P & P)(x, x; O, y) = PG, y) PO’, y). 


Corollary 2.23 /f P and P are both irreducible and aperiodic, so is P & P. If in 
addition P and P are positive recurrent, so is P & P. 


Proof Note that (P & P)" = P" & P" for every n € N*. Thus, irreducibility (and 
aperiodicity) of P & P follows from Proposition 2.22 (ii), applied to P and P. Also, 
if x and 7 are invariant probability measures for P and P, so is x Q Zt (defined as 
(Gr & (x, x^) := z (t (x^) for P & P. By Theorem 2.6, this proves positive 
recurrence. im 


Exercise 2.24 Give an example of an irreducible and positive recurrent kernel P 
such that P & P is not irreducible, and an example of an irreducible recurrent kernel 
P such that P & P is irreducible and transient. 


Exercise 2.25 Show that if PQP is irreducible, then both P and P are irreducible. 
Also show that if P&P is irreducible and recurrent, then both P and P are recurrent. 


Exercise 2.26 Let (Xn)n>0 be a Markov chain on Z \ {0} whose transition matrix 
P is given by 

PG,i+1)= PG,-i) =1/2,i21 

P(-1,) =P@,i+1I=1,i< -2. 
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(i) Draw the weighted directed graph associated with (X5) and determine whether 
the chain is irreducible. 
(ii) Find the period of the chain. 
(iii) Find a Lyapunov function V and a finite set C C Z \ {0} such that P, V, and 
C satisfy condition (b) of Theorem 2.19. 
(iv) Show that (X;,)n>0 is positive recurrent and find its unique invariant probability 
measure. 


2.5 The Convergence Theorem 


The main result of this section is the convergence theorem for irreducible aperiodic 
Markov chains. This theorem is sometimes called the ergodic theorem in the 
literature, but we prefer to reserve this terminology for Birkhoff's ergodic theorem. 


Theorem 2.27 Suppose P is irreducible and aperiodic. Let u be a probability 
measure on M. 


(i) If P is positive recurrent with invariant probability measure x, then 


lim sup |u P" (z) — x (z)| = 0. 
n—-oo zeM 


(ü) If P is not positive recurrent, then, for all z € M, 


lim wP"(z) = 0. 
n—oo 


Proof Let (Xn, Yn)nen be the canonical chain on (M x M) (i.e., (Xn, Yn) (@, ©) := 
(@n, ©n)), and let 


TA := inf[n > 1 : (Xn, Yn) € A}, 


where A := {(x, x) : x € M} is the diagonal of M. Throughout the proof, we write 
Py (respectively P,,,) for the Markov measure on (M x M yN with kernel P & P 
and initial distribution o (respectively ôx, y). By Corollary 2.23, P &P is irreducible, 
hence either recurrent or transient. 


Case 1 P & P is recurrent. For all x, y, z € M, 


Py y (Xn =zZ)= Pr y(Xn =ZTA >n)+ Pr y(Xn = 2% TA <n) 


= Pr y(Xn =Z; TA >n) + Px y (Yn =Z; TA < n) 


Px, y (TA >n) + Px, y Q — z), 
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where the second equality follows from the strong Markov property and the fact that 
Xz, = Y;,.Interchanging the roles of X, and Y,, one also has 


Py y(Y, = z) € Py y(vA > n) + P y(Xn = 2). 
Hence 
| P" (x, z) — P”, z)| = [Px,y(Xn = z) — Px, y (Ya = z)) < Pr,y(ta > n), 
and by integration 
[u P” (z) — vP"(z)| < Pu@v(ta > n) (2.3) 


for every probability measure v on M and every z € M. By recurrence of P & P 
(and Proposition 2.1 (i7i)), one has for every x, y e M that P, y(rA > n) — 0 as 
n — oo. Thus 


lim sup |u P” (z) — vP” (z) = 0 (2.4) 
n—oo zeM 


by dominated convergence. In light of Exercise 2.25, there are two subcases: P is 
either positive recurrent or null recurrent. If P is positive recurrent, (2.4) applied to 
v = v, the invariant probability measure of P, proves part (i) of the theorem. If P 
is null recurrent, let zz be an unbounded invariant measure of P (see Theorem 2.15). 
For any nonempty finite set A C M, set ztA(x) := EOD. Then mA < za 
whence 

mP"(z)  m(z) 


"P ES. RU 


Therefore, by (2.4) applied to v = z4, 


lim sup u P"(z) < lim |p.P^(z) — ma P^ C)| + A = os 


Letting A + M proves (ii) in this case because z (M) = oo. 
Case2 P © P is transient. By Proposition 2.1 (i), 
[P"(z, 2]? = (P 8 PY (G 2); (2, 2) ^ 0 
asn — oo, forall z € M. By irreducibility of P, this implies that P"(x,z) > 0 


for all x, z € M. Thus uP” (z) — 0 by dominated convergence. This proves (ii) in 
case 2. 
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As shown below, the convergence in Theorem 2.27 is geometric if there exists a 
proper map that satisfies condition (c) of Theorem 2.19 for a nonempty finite set C 
(see also Exercise 2.20). 


Theorem 2.28 Suppose P is irreducible and aperiodic, and that there exists a 
proper map V : M — [1, œ) and constants 0 € o < l1, k > 0 such that 


PV <pV +k. 


Then P is positive recurrent and, denoting by x its invariant probability measure: 


(i) One has x V < = < oo; 
(ti) There exist constants 0 < y < land c > O such that for every probability 


measure u on M, 


sup |uP"(z) — m(z)| <cy"(UV +1), Wn eN. 
zeM 


Corollary 2.29 Suppose M is finite and P irreducible and aperiodic, with invariant 
probability measure m. Then there exist constants 0 < y < 1 and c > 0 such that 
for every probability measure u on M, 


sup |wP"(z) — z (z)| < cy", WneN. 
zeM 


Proof Take V = 1 in Theorem 2.28. o 


Proof (Of Theorem 2.28) We use the same notation, P & P, (Xn, Yn), ^, etc., as in 
the proof of Theorem 2.27. 

Positive recurrence follows from Exercise 2.20 and Theorem 2.19. Assertion (i) 
follows from Exercise 2.20. By inequality (2.3) from the proof of Theorem 2.27, it 
suffices to derive an exponential upper bound on Ps; (rA > n) in order to prove 
assertion (ii). Pick x* € M and choose ¢ > 0 small enough so that V (x*) < = and 
p+e< 1. Set W(x, y) :-—V(x) -V(y), x, y e M. Then 


(P & P)W(x, y) = PV(x) + PV(y) € oW(x, y) + 2k, 


so that (P & P)W < (e+ €)W on the complement of the set 


2k 
C := {(x, y): Wi, y) < P 
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By Theorem 2.19 (iv) and assertion (i), we then obtain, for some positive constant 
c depending on x, p, and e, 


(py 2 WOME OPW pV FAV) F2 
i pre n pte 


S OU A4 RV 


um (e 


Since V is proper, the set C is finite, and Proposition 2.18 (i) together with 
(x*, x*) € C yield the existence of A > 0 such that 


At. 


max Ey, y(e""e"")) < oo. 


(x,y)EC 


Thus 


Puer (ta >n)< Puer (t(x*,x*) >n) 


SP,ez (te > n/2) + Epor (Pqx, y, (tx > n/2) 


«ce 21 + uy) 


for some other constant c. Inequality (2.3) concludes the proof. o 


2.6 Application to Renewal Theory 


Let (A;)j>1 be a sequence of i.i.d. random variables living on some probability 
space (Q, F, P) and taking values in N. Let Ag be another N-valued random 
variable on (Q, F, P), independent of (A;)j>+1 but having a possibly different 
distribution. Set 


Tn := Ao tA, +... + An. 


The sequence T := (Tn) nen is called a renewal process; To = Ao is the delay of the 
process, and (T, : n > 0} is the set of renewal times. Observe that T is a Markov 


chain with respect to the filtration Fn :— o (^o, ..., An), whose transition matrix 
has entries A(i, j) := P(A, = j — i). 
Let 


pk := P(A1 =k) 


fork € N. We say that T is aperiodic if po zz 1 and {k > 1: px > 0} is an aperiodic 
set as defined in Sect. 2.4. We say that T is LP if A; isin L^, i.e., een k? py < co. 

To fix ideas, one can imagine that a certain device breaks down and is replaced by 
a generic device at times Tọ, T1, .... The lifespan of the initial device is distributed 
as Ao and the lifespans of the replacement devices are distributed as A1. 
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From now on we shall assume that T is aperiodic. For all n € N, let 
Sn := min(k > 0 : Tk > n]. 
Then c, « oo P-almost surely so that 
Xn := TQ —n 
is well-defined. A key observation is the following: 


The set of renewal times for T equals the zero set of (Xn), 


l.e., 
{Tn :nEN}J={neN : X, =O}. 


It is easily checked that with respect to the filtration {F,,}, (Xn) is a Markov chain 
on N whose transition matrix is given by 


P(k,k — 1) =1 fork > 1, 


Zu 


P(0,k) — - fork € N, 


and 
P(k,l) =Ofork > LI zk-1. 


Let K := sup{k > 1: py > 0} € N* U {co} and M := {0,..., K — 1} (with the 
convention that M = N if K = oo). Then X, € M for n large enough (precisely 
n > (Xo — K + 1)*). On M, the chain (X,) is irreducible, recurrent, and aperiodic 
(by aperiodicity of T). 


Exercise 2.30 Verify the claims made about (X,,). In particular, show that (X,,) is 
a Markov chain with the transition matrix given above, and that (X;) restricted to 
M is irreducible, recurrent, and aperiodic. 


Let to = inf{n > 1: X, = 0). Then 


E(A 
S(t») = Ya +H PO,k) = E — = E(Ai|Ai > 0) € (0, oo], 


k>0 


where the expectation of a random variable X conditional on an event A of positive 
probability is defined as E(X|A) := E(X1,4)/P(A). The equation Eo(to) = 
E(A1)/(1 — po) implies that (Xn) is positive recurrent if and only if T is L!. 
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Exercise 2.31 Assume that (X; ), restricted to M, is positive recurrent. Express the 
unique invariant probability measure for the transition matrix P in terms of the p,;’s. 


As a consequence of Theorem 2.27, we obtain the following classical renewal 
theorem. 


Theorem 2.32 Assume that T is aperiodic. Then 


oo 
. 1 
EI =) = gay 


with the convention that the right-hand side is zero if E(A1) = oo. 


Proof Let Ny :— Yonso Mr, ij. Then 


Ne = l0 + 9 liro), 


izl 
where 
TS Aaa Peet Agel 
Thus E(N,) 2 E(E(Nk|Fe,)) = P(X, = Os and by Theorems 2.27 and 2.6, 


1 
to (To) 


lim P(X; = 0) = 
k—oo 


This proves the result. o 


2.6.1 Coupling of Renewal Processes 


Suppose that T is L!, and let T be another aperiodic L!-renewal process indepen- 
dent of T with 


T, = Ao + Aq +... Ag. 
The distribution of (Aviso may be different from the one of (A;)j>0. We are 
interested in the first time t > O that is a renewal time for both 7 and T. 


Equivalently, with X, defined in analogy to Xp, 


t :=inf{n > 1: X, = X, =O}. 


34 2 Countable Markov Chains 


We know that (X) is absorbed by M in finite time and that it is aperiodic and 
positive recurrent on M. Hence, (Xn, X n) is absorbed by M x M in finite time (M 
defined in analogy to M) and, by Corollary 2.23, it is positive recurrent on M x M. 
In particular, 


P(t < oo) = Pagā (too « œ) = 1, (2.5) 


where o (respectively &) denotes the law of Ag (respectively Ao). It turns out that 
whenever Ao, Ao and Aj, Ay are in L? for some p > 1, the same is true for r. 
A proof of this fact can be found for instance in Lindvall's book [47] and goes 
back to Pitman's seminal paper [55]. We provide here a short proof (different from 
Lindvall’s) based on Proposition 2.18 and Theorem 2.19. 


Theorem 2.33 Suppose T and T are aperiodic and in L? for some p > 1. Then 
there exists a constant c > 0, independent of the distributions of Ag and Ao, such 
that E(t?) € c(1 + ECAD) + E(AQ)). 


Proof Let Q := P® P denote the kernel of (Xn, X n). Let V be the function defined 
on N x N by V (i, j) = max(i, j) + 1. One has 


QV(i, j) - VG, j) = —lfori Z0,7j Z0, 
and (by integrability of A; and dominated convergence) 


lim QV(0, j) — V(0, j) = lim E(max(A; — j —1, -D]Ai > 0) = — 


Similarly, limjo. QV (i, 0) — V (i, 0) = —2. Condition (b) of Theorem 2.19 is then 
satisfied for the Markov process (Xn, X,) on NxN, with C = ((i, Je€NxN:V«x 
R} and R large enough. Condition (b’) is easily seen to be satisfied as well because 
^, and A, are in L?. Therefore, there is c > 0 such that for all (i, DENXN, 


li j (tho) x 277 (E; E) + ma gg) < cA + max(i, j)P). — Q.6 


Here, the first inequality follows from the strong Markov property and inequality 
T0,0 € tc + T0,0 o Orc. The second inequality follows from Theorem 2.19 (iii) and 
Proposition 2.18. Note that while (X, X) is not necessarily irreducible on N x N 
and thus a key assumption of Proposition 2.18 is not satisfied, the proof still goes 
through because any point (i, j) € N x N leads to (0, 0). Integrating the inequality 
in (2.6) with respect to a & &, the law of (Ao, Ao) = = (Xo, Xo), gives the result. 

oO 


Theorem 2.34 Suppose T and T are aperiodic and 


E(e%41) + E(e41) < oo 
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for some X49 > Q. Then there exist 0 < X < Ao and c > 0 such that 
E(e**) < c(1 + E(e^o^o) + E(g^o^0)), 


Proof The proof is similar to the proof of Theorem 2.33. Set V (i, j) :— e^» +e%/. 
Then QV (i, j) < e?^?V (i, j) + x with 


k := E(e^^!|A, > 0) + E(95À, > 0). 


Condition (c) of Theorem 2.19 is then satisfied for any 0 < A < 1— e ^9 and 
C= {(i,jEeNxN: VG, J) < R} with R sufficiently large given the choice of A 
(see also Exercise 2.20). Then, relying on 19,9 < tc + 190,0 0 Orc, the strong Markov 
property, Theorem 2.19 (iv), and Proposition 2.18, we obtain 


2; (6799) < c(1--V(i, )), V, )eNxN 


for some c > 0 and some A € (0, 1 — e~*°). Integrating this inequality with respect 
to the law of (Ao, Ao) gives the desired result. o 


2.7 Convergence Rates for Positive Recurrent Chains 


We revisit here the ergodic theorems from Sect. 2.5, Theorems 2.27 and 2.28, with 
the help of Theorems 2.33 and 2.34. 

Let M be countable and let (Xn, Yn)n>0 be the canonical chain on (M x M)N. 
Let P be an irreducible, aperiodic, and positive recurrent kernel on M. If x denotes 
the invariant probability measure of P, we have seen in the proofs of Theorems 2.27 
and 2.28 that for every probability measure u on M and every x* € M, 


sup |u P” (x) — zt (x)| € Puen (ts xo) > n), 

xeM 
where Ppor is the Markov measure with kernel P & P and initial distribution ~@z, 
and where To= x») = inf{n > 1: X, = Y, = x*]. 

Let 7) (respectively (£9) denote the successive hitting times of x* by (Xn) 
(respectively (Y,)). Then, for any probability measures a, 6 on M, the processes 
T ¿= (EtPo and T := um ks living on the probability space ((M x 
MN, B((M x M)9), Pagg) are two independent renewal processes and r(,« x*) is 
nothing but the first common renewal time for T and T. 

The Markov inequality, Theorems 2.33, 2.34, and Proposition 2.10 lead to the 
following result. 
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Theorem 2.35 Let P be irreducible, aperiodic, and positive recurrent, with invari- 
ant probability measure z. Let x* € M. 


(i) If heri) < oo for some p > 2, then there exists c > 0 such that for every 
probability measure u on M and for every n € N*, 


sup [P^ (x) — 1G] < ——rc(1 +E, (th. ')). 


xeM 


nP-1 


(ü) If $e (e^05*) < oo for some Xo > Q, then there exist 0 < à < Ag andc > 0 
such that for every probability measure p on M and for every n € N, 


sup |u P” (x) — zx (x)| < e ""c(14- E, (e^?**)). 
xeM 


Combined with Theorem 2.19, Proposition 2.18, and the strong Markov property, 
we recover and extend Theorem 2.28. 


Corollary 2.36 Let P be irreducible, aperiodic, and positive recurrent, with 
invariant probability measure m. Let V : M — [1, œ) and let C C M be as in 
Theorem 2.19 ((b^) or (c)) with C finite. Then 


(i) Under condition (b) of Theorem 2.19 for p > 2, there is c > 0 such that for 
every probability measure u on M and for every n € N*, 


sup |u P” (x) — 1 (X)| € c(1 + pV"); 


xeM gp 


(ti) Under condition (c) of Theorem 2.19, there are c, X > 0 such that for every 
probability measure u on M and for every n € N, 


sup |u P” (x) — r (x)| < e ""c(1 + uV). 
xeM 


Notes 


The book by Aldous and Fill [1] contains numerous interesting identities for the 
mean hitting times (IE; (7,)), the occupation times (IE; (N,)) and their relation to 
the rate of convergence. Convergence rates for finite Markov chains, in terms of 
the geometry of the chain, are thoroughly investigated in the monograph by Saloff- 
Coste [62] and the book by Levin, Peres, and Wilmer [46]. A nice extension of 
Chung's theorem can be found in the recent paper [3]. The coupling method leading 
to the convergence rate Theorem 2.35 goes back to Pitman [55] (see also Lindvall's 
book [47]). 


Chapter 3 A 
Random Dynamical Systems gsti 


Whether it is on countable or non-countable state spaces, numerous examples 
of Markov chains are given by random dynamical systems (also called random 
iterative systems). These are systems defined by a recursion of the form Xn+1 = 
Fo, |, (Xn) where (8n) is a sequence of independent identically distributed random 
variables. This short chapter discusses their basic properties and the question of 
the representation of a general (respectively Feller) Markov chain by a random 
dynamical system. 


3.1 General Definitions 


Let (©, A, m) be a probability space, 


F:OxM—M 
(0, x) e Fo), 
a measurable map, and (0,),-; a sequence of independent identically distributed 


(1.1.d.) O-valued random variables having law m. Consider an M-valued process 
recursively defined by 


Xn+1 c= Fo, 4 (Xn) Q.1) 


for some given random variable X). 
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Proposition 3.1 Assume that Xo is a random variable independent of (05). Then 
(X4) is a Markov chain on M whose Markov kernel is given by 


P(x,G) 2 m(0 € ©: Fo(x) € G). (3.2) 


If furthermore Fg is continuous for m-almost every 0, then P is Feller. 


Proof The proof follows (almost) directly from the definitions. Measurability of 
x +» P(x,G) is a by-product of Fubini's theorem since P(x, G) = d Igo 
F(x) m(d0). The Feller property follows from continuity under the integral sign. 

oO 


The kernel P defined by (3.2) is called the Markov kernel induced by (F,m). 
The sequence of random maps (F") defined by 


n. 
F := Fo, o Fo, 0...0 Fo 


is called the random dynamical system (RDS) induced by (F, m). 

Note that, by Chapman-Kolmogorov, the law of F”(x) is determined by P 
(F"(x) has law P" (x, -)) but, as shown by the next example, P is not sufficient 
to characterize the law of F”. 


Example 3.2 This example is due to Kifer [43]. Let M = S! = {z eC: |z| = 1) 
be the unit circle, O — [0, 1], and m(d0) — d0 the uniform Lebesgue measure. Let 
f : S! = S! be any, say continuous, map and Fg(z) = e?/7? f(z). Then P(z, -) 
is the uniform measure on S! for every z € S!, but the random dynamical system 
induced by (F, m) clearly depends on the choice of f. For instance, if f(z) = z, F” 
preserves the distance between points, while for f(z) — z?, F" locally increases 
the distance exponentially. 


Example 3.3 This example is due to Diaconis and Freedman [19]. Let M = [0, 1] 
be the closed unit interval, and 


1 
P(x,dy)— zy lio. dy + Ii yQdy. 


1 
201 —x) 


Here we adopt the convenient convention that pontta) y = 6o(dy) for x = 0 and 


AO gy = ôı(dy) for x = 1. In words, if the chain is at x it moves to a point 


y randomly chosen in the right interval [x, 1] (respectively left interval [0, x]) with 
probability 1/2. 
Let F : (0, 1) x [0, 1] — [0, 1] be defined by 


Fo(x) := 20x19 <1/2 + [x + 20 — D(1— x)]1e-172. 


Then P is induced by (F, dx). 
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Exercise 3.4 (Additive Noise) Suppose M = R” (or an abelian locally compact 
group, © = M, F:M—M, 


Fo(x) = F(x) +0 


and m(d0) = h(0)d0 with h € L! (dO). Here d0 stands for the Lebesgue measure 
(or the Haar measure) on M. Let P denote the corresponding Markov kernel given 
by (3.2). 

Given x € M, let Uy : L'(dx) — L! (dx) be the translation operator defined as 
Us (g)Cy) := g(y — x). Show that for all f € B(M), 


[Pf (x) - PFO) S ll fMocllUrG) h) 7 Uro hli. 


Deduce that P is strong Feller whenever F is continuous. One can use (or better, 
prove) that for all g € L (dx), xe€M o U(g) e L! (dx) is continuous. 


3.2 Representation of Markov Chains by RDS 


Proposition 3.1 shows that every RDS defines a Markov chain. Here we briefly 
discuss the converse problem and consider the question of representing a Markov 
chain by a suitable RDS. 

A transformation space is a set of maps f : M — M closed under composition. 
Let T be a transformation space and P a Markov kernel on M. 

We say that P can be represented by T if there exists a probability space 
(®©, A, m) and a measurable map F : © x M — M such that 


(i) Fo € Tforall0 € ©; 
(ii) P is induced by (F, m). 


Recall that a separable metric space M is called Polish if it is complete. The 
following result is folklore. 


Theorem 3.5 /f M is a Borel subset of a Polish space, then any Markov kernel 
on M can be represented by a space T of measurable maps with (0,A,m) = 
((0, 1), 5((0, 1)), à) and X the Lebesgue measure on (0, 1). 


Proof When M is a Borel subset of R, the proof is constructive and makes F 
explicit. Indeed, let G, be the cumulative distribution function of P (x, .), i.e., 


Gx(t) = P(x, (—oo, t. 
For all 0 € (0, 1) and x € M, set 


Fo (x) := G,! (0), 
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where G7! : (0, 1) > R, the generalized inverse of Gx, is defined as 
G,l(u):—inf(t € R: G,(t) = u}. 
Then 
A(0€(0,1): Fox) xt) =AG€ (0,1): 6 x Gx(t)) = G«(f). 


The proof in the general case follows from the following abstract result of measure 
theory: Every Borel subset M of a Polish space is isomorphic to a Borel subset of 
[0, 1]. That is, there exists a Borel set Mc [0, 1] and a bi-measurable bijection V : 
MM (meaning that both V and its inverse are Borel measurable). Chapter 13 of 
Dudley's book [21] contains a detailed proof of this result. Exercise 4.11 treats the 
particular case where M is compact or locally compact. 

Given such a V and a Markov kernel P on M, let P be the Markov kernel on M 
defined as P(x, A) := P(W-! (x), V-1(A)). Then P is induced by (F, à) for some 
measurable F: (0, 1) x M — M so that P is induced by (F, A) with Fa(x) = 
Wo! US Q9). o 


Blumenthal and Corson [12] prove the following result (see also Kifer [43], 
Theorem 1.2). 


Theorem 3.6 ({12]) Let M be a connected and locally connected compact metric 
space. Let P be a Feller Markov kernel such that P(x,-) has full support for all 
x € M, ie., for all x € M and for every closed set F strictly contained in M, we 
have P(x, F) « 1. Then P may be represented by T = C°(M, M) (the space of 
continuous maps f : M — M). 


The question of representation by smooth maps has been considered by Quas 
[58]. Before stating Quas's theorem, we state a result due to Jürgen Moser from 
which it will be deduced. 

Let M be a smooth (C9?) compact orientable Riemannian manifold without 
boundary, with normalized Riemannian probability measure A. If o : M > Ry 
is a Cl-density on M and 6 : M — M a C!-diffeomorphism, we let ®* denote 
the image of p by 9, i.e., 


* p(x) 
($^ p)(P(x)) TOON)" 
where J ®(x) is the Jacobian of 6, i.e., the determinant of the derivative D®(x) : 
T; M — To (,)M. In other words, if X is a random variable with density p, then 
®(X) is a random variable with density ®*p. 

In 1965, Moser [50], using the “homotopy trick" argument, proved part (i) of the 
following result in the C° case. For every positive integer k and 0 < œ < 1, we 
let C*** (M) denote the space of C*** (C* with o-Hólder kth derivatives if a > 0) 
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functions h : M — R endowed with the C***-topology, 
EF'* (hector: i h(x)A(dx) = t}, 
M 


and Dte :— (p € pira : p(x) > 0 Vx € M} the space of positive C^*?- 
densities. Plainly, Ekte is a closed subset of C+ (M), which can be identified 
with the Banach space gem and D+ is an open subset of EE 
Theorem 3.7 ([50]) Let po be a positive C* -qensity for some k > 1. Then 
(i) For any positive C" -density p, there exists a C*-diffeomorphism ®, on M with 
the property that 
$^po = p; 
(ii) The C*-diffeomorphism o, from part (i) can be chosen in such a way that the 
mapping 


D xM > M, 


(p, x) => Px) 
is CK. 


Proof Let p = po + t(p — po) for O < t < 1. We look for a family of 
diffeomorphisms (®;)rejo,1] such that 7 oo = p; for all t € [0, 1]. That is, 


JE, x)pi (b, (x)) = pox), (3.3) 


where j (t, x) is the Jacobian of ®,, evaluated at x. More precisely, we look for a 
family of vector fields ( X;);cjo,1j on M such that 6; (x) is the solution to the non- 
autonomous Cauchy problem 


dy 
— —X 
di 1(y) 


with initial condition y(0) = x. Using Jacobi’s formula for the derivative of the 
determinant of a matrix-valued function, one obtains that j (t, x) solves 


P 
= = div(X DLEA 
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with initial condition j(0,-) = 1. Thus, taking the time derivative of (3.3) and 
setting y :— ®;(x), n :— po — p gives 


div(Xi)(y)ovCy) — nO) + (Va), X00) = 0. 
Hence 
div(p; X;)Cy) = n). 


If one sets X; = VU/p;, the problem reduces to finding a function U : M > R 
such that 


AU = div(VU) = 1, (3.4) 


where one should recall that n = po — p. 
Since 


i n(x) A(dx) = 0, 
M 
(3.4) admits a solution, and we may define A^! as the particular solution 


xi B Om(xat, 
0 


where Q;g(x) := E(n(W;)|Wo = x) and W; is a Brownian motion on M. 
Furthermore, by Schauder estimates (see, e.g., Chapter 6 in [30]) A^! maps 
ae (M) continuously into C**!** (M) for every positive integer k and 0 < 
a < 1. This makes the vector field 


X? = VU /p 
a C*-vector field. It also implies that the continuous mapping 


[0,1] x D x M> TM, 


(t, p, x) > XP (x) 


is C*. 

Let t — ®,;(p, x) denote the solution to the Cauchy problem 2 = X : (y) with 
initial condition Po(p, x) = x. It then follows from standard results on differential 
equations that x — ®;(o,x) isa C*-diffeomorphism for all (t, o) € [0, 1] x D*, 
and that (x, o) — ®;(p, x) is C* for all t € [0, 1]. To conclude the proof, set 
p(x) := P1(p, x). oO 
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From Moser's theorem we deduce the following result proved by Quas [58] in 
the C?? case. 


Corollary 3.8 ([58]) Let P be a Markov kernel on M, a smooth compact orientable 
connected Riemannian manifold without boundary. Assume that for each x € M, 
P, has a CK, k > 1, positive density px with respect to the Riemannian measure, 
and that x € M + px € DÝ is C',r > 0. Then P may be represented by T = 
C' (M, M). 


Proof Let po = py, for some x9 € M and let V, = ©®,, denote the ck. 
diffeomorphism produced by Moser’s Theorem (Theorem 3.7). Then 

P(x, G) = P(xo, V; ! (G)). 
Let T = C' (M, M) and let f, € T be defined by f, (x) := Yy (y). Then 


P(x,G) 2 m(f € T: f(x) € G), 


where m is the image of P4, by the mapping y € M > f, € T. o 


Exercise 3.9 (Bernoulli Convolutions) Bernoulli convolutions are very simple, 
still fascinating, examples of random dynamical systems. 

Let 0 < a < 1 and let (X,) be the sequence of real-valued random variables 
recursively defined by 


Xn+1 =4Xn + 041, 


where (0,) is a sequence of i.i.d. random variables taking values in {—1, 1}, 
independent of Xo, and having uniform distribution m = E 


Set Y, = 377-0 a'0i41 and let 
= |i E 
Y — lim Y, — 9. 
i=0 


Throughout, we let pa denote the law of Y and Fa its cumulative distribution 
function (cdf) defined as Fy (t) = ua((—oo, t]). 
(i) Show that X, — a" X9 and Y, have the same law and deduce that (X;) 
converges in law to Ha, i.e., 


um E( f(Xn)) = na f 


for all f € Cp(R). Convergence in law will be further discussed in Sect. 4.1 
of Chap. 4. 
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(ii) Show that F; is the unique cdf solution to the functional equation 


1 t—i t 4-1 
F(t) = =| F| — |] + E| — ] |. 
2 a a 
(iii) Show that F, is continuous. 
(iv) (Law of pure types) Recall that 4a is called absolutely continuous (with 


respect to Lebesgue measure) if every Borel set having zero Lebesgue measure 
has zero j,4-measure. By the Radon-Nikodym theorem, this amounts to 


t 
ro [ fa(u) du 


for some nonnegative function f4 € L! (IR). The measure ya is called singular 
if u4 (N) = 1 for some Borel set N having zero Lebesgue measure. Show that 
Ha is either absolutely continuous or singular (compare with Lemma 4.26 in 
Chap. 4). 

(Devil's staircase) The topological support of Ha is the set of t € R such that 
Ha (1) > 0 for every open interval 7 containing t. Equivalently, this is the set 
of t € R at which F; strictly increases. 

Suppose a « i. Show that the support of ua is a Cantor set having zero 
Lebesgue measure. In this case F4 is a Devil's staircase: a continuous function 
increasing from 0 to | but almost everywhere nonincreasing. 

(vi) Show that 41/2 is the uniform distribution over [—2, 2]. 


(vii) Show that for a > 1, the support of Ha is the interval [71 7. 1 


— 


(v 


Remark 3.10 The study of Bernoulli convolutions has a long history. It started 
around 1930 with the work of Wintner and his collaborators Jessen and Kershner 
(see, e.g., [53] for a comprehensive bibliography). As seen in the previous exercise, 
when a > z Fa is continuous and strictly increasing on [^1 7. i Wintner 
proved that it is Ch! fora = 2-!/* and k > 2, but Erdós [25] in 1939 proved that 
whenever 1 is a Pisot number, then [4g is singular! A Pisot number is a real algebraic 
integer (i.e., the root of a unitary polynomial having integer coefficients) whose 
conjugates (i.e., the other roots of the polynomial) have modulus « 1. For instance, 


the golden number g — 145 is a Pisot number as the root of the polynomial 
x?-X-1. 

After Erdós, the question of describing the set of a > 5 for which [tq is 
absolutely continuous has challenged the community. In 1995 Solomyak [64] (see 
also the beautiful short proof by Solomyak and Peres [54]) proved the remarkable 


result that for almost all a > 1, Ha is absolutely continuous. 


Exercise 3.11 (The Propp and Wilson Algorithm) The representation of a 
Markov chain by a RDS can obviously be used to simulate trajectories of a given 
finite Markov chain. More surprisingly it can also serve to sample exactly and in 
finite time the invariant probability measure of a positive recurrent finite chain. This 
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is the Propp and Wilson algorithm introduced by J. Propp and D. Wilson [57] in 
1996. 
Let M be a finite set and let (F") be a RDS on M. Recall that this means that 


F” = Fg, o...0 Fa, 


where (6;) is a sequence of i.i.d. random variables on some probability space 
(©,.A,m) and © x M 3 (0, x) > F(x) is a measurable map. 
Associated to F” is the right product 


R” = Fg o...o Fo,. 


A map f : M —> M is called constant if f (x) = f(y) for all x, y e M. We let Cst 
denote the set of such maps, and 


T. = min(n > 0: R” e Cst}. 


(i) Show that R" and F" have the same distribution. 

(ii) Suppose that 7; is almost surely finite. Let Z — R”: (x) (which is independent 
of x). Show that for all n > Tẹ and y € M, R"(y) = Z. Deduce that the law 
of Z is the unique invariant probability measure of the chain induced by (F"). 

(iii) Suppose that for some a > 0, m((0 € © : Fo € Cst}) > a. Show that T, has 
a geometric tail, and is therefore almost surely finite. 

(iv) Suppose, more generally, that for some a > 0 and every subset A C M having 
cardinality | A| > 2, 


m((8 € ©: |F9(A)| < |A|) 2 a. 


Show that T, has a geometric tail and is therefore almost surely finite. 

(v) Suppose now that P is a Markov transition matrix on M having positive entries. 
Show that it is always possible to represent it by a RDS such that the condition 
assumed in question (iii) is satisfied. Explain how this can be used to produce 
an algorithm which samples the invariant probability measure of P in finite 
time. 

(vi) Let M = (0, 1} and let P be the Markov transition matrix defined by P(x, y) = 
1. Let © = (0, 1), m = 3 (8o + 61), and Fo(x) = 0x + (1 — 6)(1— x). Show 
that the Markov kernel P is represented by (F, m) but that Te = oo almost 
surely. 
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Notes 


The proof of Erdós's theorem on Bernoulli convolutions (see Remark 3.10) as well 
as numerous illustrating simulations can be found in the first chapter of [7]. For 
(much) more on Bernoulli convolutions we recommend the survey papers [53] and 
[67]. The book [46] contains a full chapter on the Propp and Wilson algorithm 
including many examples of applications. 


Chapter 4 A 
Invariant and Ergodic Probability gsti 
Measures 


Invariant and ergodic probability measures are at the heart of the (ergodic) theory 
of Markov chains. This chapter starts with a brief summary of weak convergence 
theory, which we will use throughout the book. We then define invariant measures 
and show that limit points of the empirical occupation measures of a Feller chain 
are invariant probability measures. The rest of this chapter is devoted to ergodicity. 
Basic properties of ergodic measures are established and unique ergodicity of 
“random contractions” is proved. An entire section is devoted to the fundamental 
results of deterministic ergodic theory, namely the Poincaré recurrence theorem, 
the Birkhoff ergodic theorem, and the ergodic decomposition theorem. In another 
section, we present the Markovian versions of these results. In the final section, it is 
shown how the theory can be adapted to deal with continuous-time processes. 


4.1 Weak Convergence of Probability Measures 


Let P(M) denote the set of probability measures on (M, B(M)). A sequence {un} C 
P(M) is said to converge weakly to u € P(M), written 


Hn => H, 
provided 
um Unf = uf 
for all f € Cpy(M). The following theorem, known as Portmanteau theorem, gives 


equivalent conditions for weak convergence. Note that this theorem is true in any 
metric space (without assumption of separability or completeness). 
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Let Up(M) C Cp(M) (resp. L5 (M) C Up(M)) denote the set of bounded and 
uniformly continuous (resp. bounded and Lipschitz) mappings f : M — R. 


Theorem 4.1 (Portmanteau Theorem) Let {un} C P(M) and u € P(M). The 
following conditions are equivalent: 


(a) Un > H; 

(b) Unf > uf forall f € Up(M); 

(c) Unf > uf forall f € Lj(M); 

(d) limsup, ,o5 ua (F) < u(F) for all closed sets F C M; 

(e) liminf,..oo Un(O) > u(O) for all open sets O C M; 

(f) lim, oo Un(A) = u(A) for all A € B(M) such that (9 A) = 0, where 8A :— 
A \ int(A) denotes the boundary of A. 


Proof (a) => (b) => (c) is clear and (d) = (e) holds by set complementation. 

Assume (c). Let F be a closed set, € > 0, and f;(x) := (1 — Dyt, 
where d(x, F) := infyer d(x, y). Then 1 > f; > lr and f; € Lp(M). Thus, 
lim sup un(F) < lim sup uy fe = ufs and, by dominated convergence, yfe —> 
u(F) as € > 0. This proves that (c) — (d). 

Assume (d) (and thus also (e)). Let A € B(M) be such that w(dA) = 0. 
Let F be the closure of A and O its interior. Then w(F) = j4(O) and, by 
(d) and (e), liminfu,(A) > liminfu,(O) > j4(O) and limsupu;(A) < 
lim sup u4(F) x u(F). This proves that (d), (e) > (f). 

It remains to show that (f) = (a). Assume (f) and let f € C5, (M). Replacing 
f by f +c for some c > O if necessary, we can assume that f > 0. Foralla > 0, the 
set {f > a) is open and its boundary is contained in {f = a}. Furthermore, the set 
of a > 0 such that u({f = a}) > 0 is at most countable (as the set of discontinuity 
points of the cumulative distribution function a œ> u({f < a})). Thus, by Fubini's 


theorem, (f), and dominated convergence, unf = fee Un(f > ajda > 
SI'S wf > ada = uf. o 


The following corollary is often useful. 


Corollary 4.2 Let f € B(M) and let Dy denote the set of discontinuities of f. If 
Un => wand (Df) = 0, then unf > pf. 


POON Let uf: = Un(f— 1(-)) be the i image measure of un by f. It suffices to show 


that m = uf. Indeed, = g(t) := t for |t| < || flloo, and g(t) := sign(t) || f lloo 
for |t| > || flloo. Then ufe = = unf and ul g = uf. To prove that uf => uf, we 
rely on icum (d) of the Portmanteau theorem. Let F be a closed subset of R. 
Then lim sup mi (F) x lim sup un Cf ^! (F)) x uCf- (y because un — u. Now, 


JLF) C Df U f (F) so that u(f-! (F)) = wf! (F)) = uf CP). D 


Exercise 4.3 For e, ô > 0 let A45 be the set of x € M such that | f(y) — f(z)| > € 
for some y, z € B(x, ô). Show that Dy = Unen* NmeN* A1/n,1/m and that Dy is 
measurable (even if f is not). 
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Exercise 4.4 Let P be a Markov kernel on a metric space M. Show that P is Feller 
if and only if the map ọ : M > P(M), x œ> P(x,-) is continuous (where P(M) 
is equipped with the topology of weak convergence). 


The space P (M) equipped with the topology of weak convergence is actually a 
metric space, as shown by the next proposition. 


Proposition 4.5 There exists a countable family { fa}n>0 C Cp(M) such that 


1 
Du, v) = X zg min (dn — vfal, 1) 


n>0 


is a distance on P (M) whose induced topology is the topology of weak convergence. 
That is, uj => p if and only if D(u,, p) > O. 


Remark 4.6 Unless when M is compact, the family { fn}n>0 is not dense in C5, (M) 
(see Exercise 4.10). 


Proof If M is compact, C5 (M) is separable (see Exercise 4.9) and it suffices to 
choose a dense sequence {fn} C C5 (M). If M is not compact, C5 (M) is no longer 
separable (see Exercise 4.10), but we shall prove that there exists a metric d on M, 
topologically equivalent to d, making M homeomorphic to a subset of a compact 
metric space. It will then follow that U,(M, d), the space of bounded uniformly 
continuous functions on (M, d), is separable. (Here one should recall that two 
topologically equivalent metrics may yield distinct sets of uniformly continuous 
functions.) 

Replacing d by i (which remains a distance on M inducing the same topology 
as d), we can assume that d < 1. Let {an}n>0 C M be countable and dense, and let 
H : M — [0, 1] be the map defined by 


H (x) := (d(x, dn))n»0- 


By Tychonoff's theorem (see, e.g., Theorem 2.2.8 in [21]), [0, 1 is a compact 
metric space. A metric for [0, 1]" is given by 


[xXx — Ye| 
dope EMT 


2k 
k-0 
where x = (xy)i-0, y = (yk)k>0. Set 
d(x, y) :2 e(H(x), H(y)). 


It is not hard to check that d is a metric on M inducing the same topology as d. 
The spaces (M, d) and (H(M), e) are thus isometric. Let K := H(M). Then K is 
compact (as a closed subset of a compact space) and thus, there exists a countable 
and dense family {g,} C Cp(K). Let f € Up(M, d). Since H is an isometry, the 
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map fo H -l: H(M) —> Ris uniformly continuous. It then extends to a continuous 
map f : H(M) — R. By density of {g,}, there exists, for all € > 0, some n such 
that 


lf —£»oHle; = sup |foH-(x) — gn(x)| € sup | f(x) — gn(x)| < e. 
Xc H(M) xeK 


This proves that the sequence ( fn}, with fn := gn o H, is dense in Uj(M, d). Now, 
by Theorem 4.1 (b) and density of { fk}, Un = 4 if and only if un fe > ufr for all 
k € N. This is equivalent to D(un, u) > 0. o 


One of the main advantages of the distance defined in Proposition 4.5 is that 
it allows to verify weak convergence by testing the condition ynf — uf over a 
countable set of functions. 

Two other classical distances over P (M) are the following: 


Prohorov Metric For any A C M and e > 0, let 
A* := (y € M : d(y, A) < e}. 


For all u, v € P(M) the Prohorov distance (also called the Lévy-Prohorov distance) 
between u and v is defined as 


(fl, v) = inf [e >0: u(A) x v(A5) + e forall A € B(M)} j (4.1) 


Fortet-Mourier Metric Let L5(M) C C5 (M) be the space of bounded Lipschitz 
maps equipped with the norm 


lf lor = llf loo + LipCf). 
where 


If) — FOI, 


Lip(f) := wp] E, ; Ge Magy]. 


For all u, v € P(M) the Fortet-Mourier distance between u and v is defined as 


plu, v) := sup{luf — vf| : f € Lo), p flor S 1}. (4.2) 


Theorem 4.7 The maps z and p are distances on P (M). Let {un} C P(M) and 
u € P(M). The following conditions are equivalent: 


(a) Un > H; 
(b) o(un, u) > 0; 
(c) zt(un, u) > 0. 
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Proof We only prove that (a) = (b). For more details and the proof of (b) = 
(c), see Dudley [21]. The implication (b) = (a) follows from assertion (c) of 
Theorem 4.1. Conversely assume (a). We first assume that M is complete. Fix ¢ > 
0. By Ulam's Theorem (or Prohorov's Theorem 4.13 below), one can choose K C 
M compact such that 


UK) > l-e. (4.3) 
Let Ke = (x € M : d(x, K) < £}. By assertion (e) of Theorem 4.1, 
Un(Ke) > 1—6& (4.4) 


for n sufficiently large. By the Arzelà-Ascoli theorem, the unit ball Lp ı := {f € 
Ly: |lflla < 1} restricted to K is a compact subset of C (K). There exists then 
a finite set {f1,..., fy} C Lp, such that for all f € Lp there is some i € 
{1,..., N} such that | f(x) — fi(x)| < e for all x € K. Since f and f; have a 
Lipschitz constant « 1, we also get that 


|f (x) — fi(x)| x 3e (4.5) 


for all x € K;. Now 


[Unf — Af | < IG =W fil Kn — MCF — fk.) t Kun — BC — fi yx.) 


Thus, using inequalities (4.3), (4.4), and (4.5), we obtain 
p(un, U) € max |(us — u) fil + 85. 
1<i<N 


This proves (b) for M complete. If M is not complete, we can replace it by its 
completion M. Any map f € Lp extends to a bounded Lipschitz map on M and the 
measures (un) and u can be seen as measures on M so that the proof goes through. 

oO 


Remark 4.8 Theorem 4.1 is true in any (not necessarily separable) metric space. 
The equivalences in Theorem 4.7 require separability (but not completeness). 


Exercise 4.9 Let K be a compact metric space (and thus also a Polish space). Using 
the proof of Proposition 4.5, show that K is homeomorphic to a compact subset of 
[0, 1], equipped with the metric e. We now identify K with a subset of [0, 1]. Let 
P be the set of real-valued functions on [0, 1 of the form p(x) = q(xo, .... Xn), 
where q : [0, 1]*! + Risa polynomial in (n + 1) variables with rational 
coefficients. Use the Stone-Weierstrass theorem to show that P|x = (plk : p € P} 
is dense in C(K). This shows that C(K) is separable. Since Cy (K) is a subset of 
the separable metric space C(K), it is itself separable. 
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Exercise 4.10 Let X be a topological space. Suppose that there exists an uncount- 
able family {Og} of open sets such that Oy N Og = Ø fora zz B. Show that X 
is not separable. Show that C; (IR), the set of continuous bounded functions on IR, 
is not separable. Hint: Let f € Cp(R) be such that f(n) = O and f = 1 on 
[n+ 1/(n + D, n 4-1 — 1/(n + D] for all n € N*. Set f(t) := f(x + t) and 
consider the family {Ox},<0,1), where Ox := (g € CAR) : | fx — glloo < 1/2]. 


Exercise 4.11 (Borel Isomorphism) We say that two measurable spaces X and Y 
are isomorphic if there exists a bi-measurable bijection V : X — Y, meaning that 
both Y and Y7! are measurable. It turns out that every Borel subset M of a Polish 
space is isomorphic to a Borel subset of [0, 1] (see Remark 4.12). The purpose 
of this exercise is to prove this result when M is compact or locally compact and 
separable. 


(i) Let (0, 1) be equipped with the product topology and Borel o -field. Show 
that (0, p is a metric space with the metric d defined as 


doa = pem 


izl 
(ii) Show that the map 


V : (0, 1) = [0, 1], 


is 1-Lipschitz continuous. 

(ii) Let / C (0, 1) be the set of œ such that c; = 0 for infinitely many i and 
@; = 1 for infinitely many j. Show that Í is a Borel subset of (0, 1) and 
that Y|; (V restricted to I )is a homeomorphism onto wT ), Le., a continuous 
bijection with continuous inverse. 

(iv) Show that [0, 1] and (0, 1}\” are isomorphic. Hint: Use (iii) and the fact that 
the complement of Í in (0, 1) is countably infinite. 

(v) Show that there is a homeomorphism between (0, p and (0, pj 
equipped with the metric 


e(A, B) :— y uns UE. 


2j 
jzl 


Then show that [0, 1] and [0, 1] are isomorphic. Relying on the proof of 
Proposition 4.5, deduce that every compact (or locally compact separable) 
metric space is isomorphic to a Borel subset of [0, 1]. Hint: Any locally 
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compact separable metric space can be written as a countable union of compact 
sets, see, e.g., Theorem XI.6.3 in [23]. 


Remark 4.12 Theorem 13.1.1 in [21] implies the following: If M is a Borel subset 
of a Polish space, and if B is a Borel subset of [0, 1] whose cardinality equals the 
cardinality of M, then M and B are isomorphic. Since the cardinality of a Borel 
subset of a Polish space is either finite, countably infinite, or the cardinality of the 
continuum, every such set is in fact isomorphic to a large class of Borel subsets of 
[0, 1]. 


4.1.1 Tightness and Prohorov's Theorem 


A set P C P(M)is called tight (sometimes uniformly tight) if for every € > 0 there 
exists a compact set K C M such that 


u(K)zl—se 


for all u € P. Observe in particular that if M is compact, every subset of P(M) 
is tight. A set P C (M) is called relatively compact if it has compact closure in 
P(M) (equipped with one of the distances x, p, or any other distance characterizing 
weak convergence). Finally, it is called totally bounded if for every € > O there is a 
finite set A C P such that the following holds: For every u € P there is v € A with 
d(u, v) < £. Here, d can be the Prohorov metric, the Fortet-Mourier metric, or any 
other metric on (M) characterizing weak convergence. 

The following theorem usually referred to as Prohorov's theorem asserts that 
tightness and relative compactness are equivalent in a Polish space (complete and 
separable metric space). Here the assumption that M is a Polish space is crucial, 
for otherwise the implication (b) — (a) may be false. See, e.g., Billingsley [11] or 
Dudley [21, Chapter 11.5] for a proof of Prohorov's theorem. 


Theorem 4.13 (Prohorov's Theorem) Assume M is a Polish space (i.e., a com- 
plete separable metric space). Then the following assertions are equivalent: 


(a) P is tight; 

(b) P is relatively compact; 

(c) Every sequence {un} C P has a convergent subsequence un, => u € P(M); 
(d) P is totally bounded for x or p. 


Remark 4.14 The latter property shows that P(M) is complete for o or zr since 
every Cauchy sequence is totally bounded. 
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A Tightness Criterion 


We conclude this subsection with a simple practical Lyapunov-type condition 
ensuring tightness of a sequence of probability measures. 
A measurable map V : M — R is called proper if for all R € R the set 


{V <R}={xeM: V(x) < R} 


has compact closure. 
Proposition 4.15 Let V : M — Rt be a proper map and let {un} be a sequence 
in P(M) such that 


lim sup Un V < K < oo. 
n—oo 


Then {un} is tight. Assume furthermore that V is continuous. Then 


(i) For every limit point u of {un}, uV € K; 
(ii) Let H : M — R be a continuous function such that G = Am is proper. If 
Ln => p, then u4 H — uH. 


Proof Fix £ > 0 and let R > 0 be so large that limsup, ,4, UnV < &R. By the 
Markov inequality, lim sup, ,4, Hn{V > R} < limsup, ,4, 7 d < e. Let now u = 
lim un, be a limit point of {un}. Then for all R > 0, u(V ^ R) = liMmk>oœ HUn, (V ^ 
R) < K. Thus uV < K by monotone convergence. 

We pass to the proof of (ii). Let G = Hm . For all R € R \ D with D at most 
countable, u{G = R} = 0 and, therefore, 


lim Un(H1e<r) = u(Hlg<R). 
n—oo 
On the other hand un (|H 16-8) < Un (16-8) < sitas (V). Thus 
lim lim sup u5(|H|16 8) = 0 
R—00 n—oo 
and, similarly, 
lim “(|H|1g>r) = 0. 
R—oo 


This proves the result. o 
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4.2 Invariant Measures 


Given a Markov kernel P, a measure (respectively a probability measure) u is called 
P-invariant or simply invariant if 


uPf = uf (4.6) 
for all f € B(M), where Pf is defined by (1.1). Equivalently, 
MP =u, 


where uP is defined by (1.2). 


Exercise 4.16 Let R/Z denote the set of equivalence classes with respect to the 
equivalence relation x ^ y = x — y € Z on R. The set R/Z can be thought of 
as the unit interval [0, 1], where 0 and 1 are identified with each other. Let (0,),-1 
be an i.i.d. sequence of random variables with distribution m, where m is a Borel 
probability measure on R/Z. For every 0 € R, let 


Fo : R/Z — R/Z, x — x 4-0 mod 1. 


Show that the Lebesgue measure on R/Z is an invariant probability measure for the 
Markov kernel induced by (F, m). 


Remark 4.17 Let C denote a set of bounded, measurable mappings f : M — 
IR, closed under multiplication and such that B(M) = o(C) (the smallest o- 
field making elements of C measurable). By a monotone class argument (see 
Theorem A. 1), it suffices to check (4.6) on C to prove P-invariance of y € P(M). 

For instance, one can choose C = C (M), the set of bounded continuous 
functions. One can also choose any set C C C5 (M) closed under multiplication 
and such that for all f € Cp»(M) there is a sequence {fa} C C such that 
lim; oo fn(x) = f(x) forall x € M. 


We let Inv( P) denote the set of P-invariant probability measures. The set Inv( P) 
might be empty as shown by the following two examples. 


Example 4.18 Let M = [0, 1] and f : M — M be the map defined by f(x) = x/2 
for x ~ 0 and f(0) = 1. Then the (deterministic) chain X444 = f(Xn) has no 
invariant probability measure. For otherwise the Poincaré recurrence theorem (see 
Theorem 4.41 below) would imply that such a measure is ôo, but f(0) = 1. 


Example 4.19 Consider the pair (F, m) introduced in Exercise 4.16. Let us assume 
in addition that f,|0| m(d0) < oo and set a :— [4,0 m(d0). While the 
corresponding Markov kernel P has the Lebesgue measure as an invariant measure, 
P does not admit any invariant probability measures if o Z 0. 

To see this, let u be a probability measure on (R, B(R)). Then there is K > 0 
such that w([—K, K]) > 0. If u was invariant for P, the Markov chain (Xn)neN 
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induced by (F, m) and with Xo ~ u would satisfy 
0 < w([-K, K]) = uP'(-K, KD = P(|Xn| < K), VneN*. 


But if a > 0 (æ < 0), one has limyso X, = oo (limy+o Xn = —oo) P- 
almost surely by the law of large numbers. Hence lim, 5; P(|Xn| < K) = 0,a 
contradiction. 


Given a Markov chain (X,,) on M, the associated family of empirical occupation 
measures is defined as 


ta 
Vn = = 2. x, n (4.7) 


Notice that each v, is a random element of P (M). 
A sufficient condition ensuring existence of invariant probability measures is 
given by the following classical theorem (see, e.g., [22]). 


Theorem 4.20 Let (Xn) denote a Feller Markov chain (defined on (Q, F, P)) on 
M with kernel P. Then the following statements hold. 


(i) P-almost surely, every limit point of the family of empirical occupation 
measures (Vn)n>1 is P-invariant; 
(ii) If (vn)ns1 is tight with positive P-probability, then Inv(P) is nonempty. 


Proof 


(i) Let f € B(M). Set Un+1 :— f(Xn41) — Pf(X4), Mo := 0, and Mj44 :— 
Mn + Uns for n > 0. Then (M5) is an L?-martingale, whose predictable 
quadratic variation (see the section on martingale theory in the appendix) 
verifies 


(M)s41 — (M), = EU Fn) = Pf? (Xn) — (PA? (Xn) < NF 12. 


Hence by the strong law of large numbers for martingales (see Theorem A.8), 


M, 
0— lim — = lim v, f —v, (Pf) (4.8) 
n— oo 


noo n 


almost surely. Let ( fx) C C5 (M) be as in Proposition 4.5. Then, by the Feller 
property, P fk is in C5 (M) for all k and, consequently, with probability one 


vfi — v(Pfx) 20 


for every limit point v of {vn} and every k € N. Thus, v = vP. 
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(ii) Let o € Q such that (vn (œ@))n>1 is tight and all of its limit points are P- 
invariant. By Prohorov's theorem, (v,(@))n>1 admits at least one limit point, 
so Inv( P) is nonempty. 


I 
Corollary 4.21 If M is compact and P is Feller, Inv(P) is a nonempty compact 


convex subset of P(M). Convexity of Inv(P) holds for arbitrary metric spaces and 
Markov kernels. 


4.2.1 Tightness Criteria for Empirical Occupation Measures 


When M is noncompact, the tightness of the empirical occupation measures (vn) 
can be ensured by the existence of a convenient Lyapunov function. This is a proper 
map V : M — R+ such that PV — V is "sufficiently" negative. 


Corollary 4.22 Let V : M — R+ bea proper map. Assume that PV < V and that 
E(V(Xo)) < oo. Then the family of empirical occupation measures (vn) is almost 
surely tight. 


Proof The sequence (V, = V(X4)) being a nonnegative supermartingale with 
E(Vo) < oo, it converges almost surely to some finite random variable Vo. (see 
Theorem A.6). This implies that v; V — Væ almost surely and the result follows 
from Proposition 4.15. Oo 


Another result, in the same spirit, is 


Corollary 4.23 Let V : M — R+ be a proper map. Assume that 
PV < pV +K, 


withk > 0,0 < p < 1, and E(V(Xo)) < oo. Then 


lim sup vn y V < ae 
n—oo 1— p 


almost surely. In particular, (vn) is tight. The set Inv(P) is a nonempty compact 
convex subset of P (M) and for all u € Inv(P), uV < 1 


z 
Proof Set W = ~V. Then, by Jensen's inequality, 


PW x 4PV x JpV +k € JpW c Ak. 
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Set LW(x) = PW(x) — W(x), Mo = 0, and 


n—1 
Mn = W(Xn) — W(Xo) - 3 LW(Xx) 
k=0 


for all n > 1. Then (M,) is an L?-martingale whose predictable quadratic variation 
process is given as (M)o — 0 and 


(M)n41 — (M)s = E((Mn41 — Mn)” | Fn) = PV (Xn) — (PW) (X4) < PV (Xn) 


lora > D Thus E((M,) < Y, EC" vxo) < nic; + FEV (X9), 
where the last inequality easily follows from the assumptions on V. Then, by the 
second strong law of large numbers for L?-martingales (Theorem A.8 (iv)), vn => 


0 almost surely. Now, because -LW > (1 — ,/o)W — JK; 


M, W(X 
(1 — Vp W < ve + Se 4 SOO 


This, combined with Proposition 4.15, proves the first statement. 
By Theorem 4.20, Inv(P) is nonempty. Let u € Inv(P). For all n € N*, 
n 


d 
PV Sn Ve E 


P 1 
<p Woes 


Thus, by invariance and Jensen’s inequality, 


UCV A M) = MPM ^ M) S (PV ^ M) < u((p"V + >) AM). 


Letting n — oo in the right-hand term and using dominated convergence shows 


that L(V ^ M) € q x "t Then LV < 4 &— by monotone convergence. Compactness 


follows from Proposition 4.15 and Prohorov's theorem. o 


Exercise 4.24 (Invariant Measures and Mean-Occupation) Let (X) be a 
Markov chain, T a finite stopping time (i.e., T < oo a.s.) and let v be the “mean 
occupation measure up to time T” defined for all f € B(M), f > 0, as 


T-1 


vf i= 203 rau). 
k=0 


(i) Show that v(Pf) — vf = E(f (Xr) — E(f Oto). 
(ii) Show that if Xo and X7 have the same distribution and E(T) < oo, then sn 
is an invariant probability measure for the chain. 
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4.3 Excessive Measures 


A measure p is called excessive provided 
uP x p. 


Lemma 4.25 Every finite excessive measure is invariant. 


Proof If n is a finite excessive measure, then uP(A) < 4(A) and u(M) — 
P(A) = uP(A*) € n(A*) = u(M) — n(A), so that uP(A) = u (A). n 


Given two Borel measures o and 8 on M, one calls œ absolutely continuous with 
respect to & and writes œ « f if for every A € B(M), (A) = 0 implies that 
a (A) = 0. One says that o and £ are mutually singular and writes o L p if there is 
A € B(M) such that «(A) = B(A^) = 0. Let u and v be Borel measures on M. By 
Lebesgue's decomposition theorem (see, e.g., Theorem 3.8 in [27]), v = vac + Vs, 
where vac «& u and v, L u. Equivalently, 


v(dx) = h(x)u(dx) + 1,4(x)v(dx), 


where h € L! (u) and u(A) = 0. 


Lemma 4.26 Let u, v € Inv(P). Then the absolutely continuous and the singular 
parts of v with respect to u are invariant measures. 


Proof Write v(dx) = h(x)u(dx) + 14(x)v(dx) with h € L!(w) and (A) = 0. 
By invariance, u(A) = f P(x, A)u(dx) = 0, so that P(x, A) = O for -almost 
every x € M. Thus, for every Borel set B, 


J Pa Bho wan = f Po Boah ia < v(BN A) = (hu) (B). 


This proves that h(x)u(dx) is finite and excessive, hence invariant. Since 
la4(x)v(dx) = v(dx) — h(x)u(dx) and since v € Inv(P), the measure 14(x)v(dx) 
is invariant as well. oO 


4.4 Ergodic Measures 


Let u € Inv(P). A bounded, measurable function g is called (P, j2)-invariant 
provided Pg = g, y-almost surely. A set A € B(M) is called (P, z)-invariant 
if 14 is (P, j)-invariant. 

An invariant probability measure jz is called ergodic (for P) if every (P, m)- 
invariant function is u-almost surely constant. (A function f : M — R is called 
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u-almost surely constant if there is c € R such that f(x) = c for -almost every 
x € M.) 


Lemma 4.27 A probability measure u € Inv(P) is ergodic if and only if every 
(P, p)-invariant set has -measure 0 or 1. 


Proof Suppose first that u € Inv(P) is not ergodic. Then there exists a bounded, 
measurable function h such that Ph = h, js-almost surely, and for every c € R 


Aix e M: h(x) =c}) < 1. 


It follows that for some c € R, A := {x € M : h(x) > c} has j.-measure different 
from 0 and 1. 


Claim A is (P, 4)-invariant. 


Proof of the Claim By Jensen's inequality, | Ph| < P|h|. Since w(P|h| — |h|) = 
0 by P-invariance of u, and since Ph = h w-almost surely, this proves that |A| 
is (P, j)-invariant as well. Hence, max(0,h) = 5(h + |A|) is CP, 2)-invariant. 
Similarly, 


hn := min(n max(0, h — c), 1) 


is (P, u)-invariant for every n > 1. Since lim; o5; = 14, 14 is (P, m)- 
invariant as the pointwise limit of a uniformly bounded sequence of (P, j1)-invariant 
functions. This proves the claim and one direction of the lemma. 

For the converse direction, let u be ergodic and let A be a (P, j)-invariant set. 
Then 14 is a (P, j4)-invariant function, and ergodicity of u implies that there is 
c € R such that 14 is u-almost surely equal to c. Necessarily, c € (0, 1}, whence it 
follows that (A) € (0, 1]. 


oO 


Remark 4.28 One usually defines a harmonic map as a measurable map (bounded 
or nonnegative) such that Pf = f. Note that a harmonic map is (P, j2)-invariant for 
every u € Inv(P). 


A probability measure u € Inv(P) is called extremal if it cannot be written as 
u = (1 — t)uo + tuj with uo, uj € Inv(P),0 < t < 1, and uo Æ mı. Notice 
that an extremal invariant probability measure cannot be written as the sum of two 
nontrivial invariant measures that are mutually singular. This fact will be used below 
in the proof of Proposition 4.29 (ii). 


Proposition 4.29 


(i) An invariant probability measure u is ergodic if and only if it is extremal. 
(ü) Two distinct ergodic measures are mutually singular. 
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Proof 


(i) Suppose that jz is nonergodic. By Lemma 4.27, there exists a (P, j2)-invariant 
set A such that 0 < u(A) < 1. Let AC) := u(A N -)/u(A). We claim that 
for every g € B(M), 


P(g14) = (Pg)14 
u-almost surely. Indeed, by the Cauchy-Schwarz inequality, 
|P(g14)l < P(g?) PCa) = P(g)1a 


jt-almost surely. Thus P(g14)14« = 0, m-almost surely, and, interchanging 
the roles of A and A^, P(g14-)1A = 0, u-almost surely. On the other hand, 


P(gla) — (P8)14 = [P(gla) — Palla + P(gla) Lac 


= —P(glac)la + P(g14)1ac = 0. 


This proves the claim. Therefore, 


1 1 
Pg) = ——uwu((Pe)1,) = ——u(P(e1 
ILACPg) ( y He g)14) ( y (814)) 


1 
= way Glo = 1a(g). 
This proves that u4 is an invariant probability measure. Similarly, uac is an 
invariant probability measure, and since u = u(A)uA + (1 — u(A))u ae, the 
probability measure u is nonextremal. 

Suppose now that jz is ergodic and that u = (1 — t)uo + tu1 with uo, ui € 
Inv(P) and t € [0, 1]. If? Z O, mı «& u. Hence, there exists h € L! (u) such 
that uı(dx) = h(x)u (dx). Furthermore, h < 1/t, -almost surely, because 
for all c > 0, 


tcu{h > c) € tui{h > c} < wth = cj. 
In particular, h and Ph lie in L*(w). Then, by Jensen's inequality, 
0 < w((Ph — hy) = u((Ph? — 2hPh + h?) < (Ph? — 2hPh +h?) 


= 2h? — 21 Ph = 2uh? — 2h = 0, 
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from which it follows that Ph = h, u.-almost surely, and, by ergodicity and the 
fact that u and u; are probability measures, h = 1. As a result, t = 1 and p is 
extremal. 

(ii) Let u and v be ergodic. Write v(dx) = h(x)u(dx) + us (dx) with us singular 
with respect to u and h € L (y). By Lemma 4.26, h(x)u(dx) and uş are 
invariant, and by extremality either h = 0 or ws = 0. If h = 0, we are done. If 
[ts = 0, we claim that Ph = h, y-almost surely. Thus, by ergodicity, h = 1, 
-almost surely. This yields u = v and we are done. The proof of the claim is 
easy if h € L? (u) because, reasoning exactly as in the end of the proof of (i), 
one finds that (Ph — h)? = 0. If now h € L! (u) \ L?(w), set hr 2 h ^n 
and un = hyp. Then, for all A € B(M), un P(A) < nuP(A) = np(A) and 
Un(A) € vP(A) = v(A). Thus 


Mn P(A) = UnP(AN {h < nj) + u4 P(AN (h > nj) 


€ wAn(h € nj) c nu(An {h > n}) = us(A). 


This shows that j4, is excessive, hence invariant, by Lemma 4.25. Thus, 
Ph, = hy, by what precedes, and Ph = h, -almost surely, by monotone 
convergence. 


4.5 Unique Ergodicity 


We say that (Xn) or P is uniquely ergodic if the set of P-invariant probability 
measures has cardinality one. An immediate consequence of the preceding section 
is 

Proposition 4.30 If P is uniquely ergodic, then its invariant probability measure is 
ergodic. 


While a deterministic dynamical system is rarely uniquely ergodic (see Sect. 4.6 for 
a definition of ergodic probability measures for deterministic dynamical systems), 
this property is much more often satisfied by random dynamical systems and 
Markov chains. We start with a simple situation, which can be seen as a random 
version of the Banach fixed point theorem. 
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4.5.1 Unique Ergodicity of Random Contractions 


Throughout this subsection, let M be a complete, separable metric space. Recall that 
amap f : M — M isacontraction if its Lipschitz constant 


dCf (x), FO) isl 


Lip(f) := sup | ee 


is < 1. By the Banach fixed point theorem, a contraction f has a unique fixed point 
x*, and forall x € M, f"(x) — x* at an exponential rate. Here, using the notation 
of Chap. 3, we shall consider a Markov chain recursively defined by 


Xn41 = Fons (Xn) 


under the assumption that the maps Fg are contracting on average. 

More precisely, we assume that for each 0. € ©, the map Fg is Lipschitz 
continuous, and we let lọ :— Lip(Fe). Note that, by separability, the supremum 
in the definition of the Lipschitz constant can be chosen over a countable set, so that 
lo is measurable in 0. 

We say that the family (Fa) is contracting on average if f log(lo)* m(d0) < oo 
and 


fret) m(d0) =: —a < 0. 


Here, we allow for o to be 4-co. The next result is classical and has been proved in 
several places. Here we follow the approach of Diaconis and Freedman [19]. 


Theorem 4.31 Assume that {Fo} is contracting on average and that 


f sc (xo), xo)) * m(d0) < oo (4.9) 


for some xo € M. Then the induced Markov chain has a unique invariant probability 
measure u*, and X, converges in distribution to u*. In other words, for every 
probability measure u on M, 

uP" => n. 


If we furthermore assume that a « oo, 


A := sup | log(lọ) +a |< oo 
0 
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and 

B= [ «o». xo) m(d0) < co, 
then for every x € M there is C(x) > 0 such that 

p, P", w*) x CG)e "P, Vn EN, 
where p stands for the Fortet-Mourier distance (see (4.2)), 


and B := min{a/4, æ? / (324?)). 


Proof For all x € M, set X} :— Fo, o... o Fa, (x) and Y7 := Fo, o... 0 Fo, (x). 
The idea of the proof is to show that (Y7) converges almost surely (and thus in law) 
to some random variable Yo, independent of x. Since X7 and Y7 have the same law, 
this implies that (X7) converges in law to Yoo. 

To shorten notation, set ln :— lop, Ln :— [[;., li, and Y, :— Yp’ for xo as 
in (4.9). By the strong law of large numbers, P-almost surely, 


log(Ln) 
m ——— = 


n— oo A 


—a € [—oo, 0). (4.10) 
Thus, P-almost surely, 


log(d(Y; , Y, 
lim sup og( ( n^ n)) < —g 


n—oo n 


Cauchy, by completeness of M hence convergent. 
For all n, p € N, 


because d(Y}, Y?) < L,d(x, y). We shall now show that (Y,,) is almost surely 


p-1 
d(Yn+p, Yn) < > d(Yn+i+1, Ynti) < > Lntid Fonsi (xo), xo). (4.11) 
i=0 i20 


Let 0 < £ < a/2. Then 


X Plog d (Fo, (x0), xo) = en) x X` Püog(d(Fo, (xo). x9)) * = en) 


n>1 n>1 


1 
= 7 F(üog(d(Fa, (x0), x0))*) < oo. 


4.5 Unique Ergodicity 65 


(Here, we used that Donzi P(é > n) < E(&) for every nonnegative random variable 
£, as well as the integrability condition in (4.9).) Thus, by Borel—Cantelli, 


log d (F; ; 
Maap og d (Fo, (xo), xo) eP 


noo n 


almost surely. Combined with (4.11) and (4.10), it follows that, almost surely, for n 
large enough, 


d(Yn4. p, Yn) < > e "ti (a—28) 


i>0 
This concludes the proof of the first statement, with u* the law of the limiting 
random variable Y% (see also Exercise 4.32). 


We now pass to the second statement. For every bounded Lipschitz function f 
with || f|; < 1 and for every à > 0, 


[8x P" f — u* f| = IEC Œ — f(Yo9)] € 8 + 2P(d(Y, , Yoo) = ô). (4.12) 
First observe that by (4.11), 


d(Yz, Yoo) < d(Y}, Yn) + d (Yn, Yoo)  Lad(x, xo) + D> Lntid (xo, Fa, ,, (x9)). 


i>0 
By Markov’s inequality, 
P(d(xo, Fo, (xo)) =e") x Be ^" 
and by a standard Chernoff inequality (see Exercise 4.33 below), 
P(L, > etn) < e 624) 
Thus 


Pas rada xo) + eee) 


iz0 


< ene? /2A?) 23 (room d ge se. 


iz0 
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Choose £ = a/4. Then 


. 1 
P(d(Y; , Yoo) = e "P d(x, xo) + ———5)) 
1 — e-«^? 
< ena? /G245) ( 1 + NEM MED + pera l! 
E 1 — e—#/(32A?) 1 — e-o/4" 
and we obtain the desired estimate with the help of (4.12). o 


Exercise 4.32 Let P be a Markov kernel on a separable metric space M, and let u* 
be a Borel probability measure on M such that for every x € M, ôx P" converges 
weakly to u* as n — oo. Show that if P is Feller, then u* is the unique invariant 
probability measure for P. 


Exercise 4.33 (Chernoff Bounds) Let X be an L!-random variable with zero 
mean. Assume that E(e*°*) < oo for some Ao > 0. Let g(A) :— In(E(e**)). 


(i) Show that for all e > 0 and 0 <A < Ag, 
P(X > £) < e ets) 
and 
Pagg, 
where 


g*(e):= sup (Ae — g0). 
O<A<AQ 


(ii) Assume |X| < A < oo. Show that g(A) < ax and g*(e) > m Hint: For 
the first inequality, it may help to use convexity of g. 

(iii) Let (Xn) be a sequence of i.i.d. random variables with the same distribution as 
X. Show that 


P(X, +...+ Xn > ne) < ene) 
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4.6 Classical Results from Ergodic Theory 


We first recall some basic definitions from ergodic theory. There are numerous 
textbooks on the subject including Cornfeld, Fomin, Sinai [17], Mafié [48], Katok 
and Hasselblatt [42]. 

Let (X, F) be a measurable space and T : X — X a measurable mapping. A 
probability measure P over X is called T-invariant (or simply invariant) if 


P(T (A) = P(A) 


for all A € F. Given such a P, a measurable function g : X — R is called (T, P)- 
invariant if g o T = g, P-almost surely, and a measurable set A € F is called 
(T, P)-invariant if 14 is (T, P)-invariant. One also defines a T-invariant set (or 
simply invariant set) as a set A € F such that T-!(A) = A. Note that this definition 
of invariance makes no reference to the measure P and that a T-invariant set is 
clearly (T, P)-invariant. 

A T-invariant probability measure P is called T-ergodic (or simply ergodic) 
provided that every (T, P)-invariant function is P-almost surely constant. 


Example 4.34 A periodic point of period d > 1 for T is a point x € X such that 
T? (x) = x and T'(x) Æ x fori = 1,...,d — 1. Given such a point, the measure 


1 
1 + OT (x) + inate + ÓTa-1(y)) 


is T -ergodic. 

Remark 4.35 One sometimes says that T is ergodic with respect to P to mean that 
P is T -ergodic. 

Proposition 4.36 The following assertions are equivalent: 

(a) The probability measure P is T -ergodic; 


(b) Every (T, P)-invariant set has P-measure 0 or 1; 
(c) Every T-invariant set has P-measure 0 or 1. 


Proof The implications (a) — (b) — (c) are obvious. To show that (c) — (5), let 
A be a (T, P)-invariant set. The set 


À :— (x € X : T'(x) € A for infinitely many k € N} 


is invariant. Hence, by (c), P(A) € (0, 1). Ifx € AV A, there exists k > 1 such that 
X€AwN T-*(A), and if x € A \ A, there exists k > 1 such that x € T~*(A) \ A. It 
then follows that 


AMA c | J AAT ™(A). 
k>1 


68 4 Invariant and Ergodic Probability Measures 


Thus 
P(AAA) x X` P(AAT *(A)). 
k>1 
Now 
k-1 
P(AAT-*(A)) < 3 P(T-(AJAT-U*P (A)) = KP(AAT-(A)) = 0. 
i=0 


It remains to prove that (b) — (a). Let h be (T, P)-invariant. Then, for each c € R, 
the set (x € X : h(x) > c] is (T, P)-invariant and the result follows. o 


Exercise 4.37 (Rotations) Let S! = R/Z, œ € S!, and Ty : S! — S! the rotation 
x e x-Fa. Describe the invariant and ergodic probability measures of Ty. Show that 
when a is irrational (i.e.,  — +Z with £ € (0, 1)\Q), Ty is uniquely ergodic and, 
more precisely, the normalized Lebesgue measure A on S! is the unique invariant 
probability measure for Ty. 


Exercise 4.38 Let k > 2 be an integer and Zk : S! — Sl, x e kx. Show that 
the normalized Lebesgue measure A is ergodic for Z*. Show that Z* has infinitely 
many periodic points, hence infinitely many ergodic measures. 


Exercise 4.39 (Shift) Let M = (0, 1) and let © be the shift map on M defined 
by O(@); = a+1. Show the following statements. 


(a) For all n > 1, © has 2” periodic orbits of period n, and the set of periodic 
points is dense in M; 

(b) There is a point x € M whose orbit is dense in M; 

(c) The probability measure (4 (8o + 81) 8N* is ergodic for ©; 

(d) There exists a continuous surjective map V : M — S! such that 


VoO-Z ow, 


where Z? is defined as in Exercise 4.38. Hint: One can use Exercise 4.11. 


Using (d), prove that Z? possesses a dense orbit and give an alternative proof of the 
results of Exercise 4.38 when k — 2. 


Exercise 4.40 Let T : Sx $1 > S! x5, (x, y)  (x--a, y+x) with o irrational. 
Show that AQA is ergodic. Hint: One can use the fact that every f € L*(A@A) can be 
written as a Fourier series f(x, y) = Drez Ckiek(x)ei(y), where ex (x) = enka 


and 55, ; lcu? < oo. 
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4.6.1 Poincaré, Birkhoff, and Ergodic Decomposition 
Theorems 


The first important result from ergodic theory is the Poincaré recurrence theorem. 
Notice that there is no assumption here that P is ergodic. 


Theorem 4.41 (Poincaré Recurrence Theorem) Let P be a T-invariant proba- 
bility measure. For every measurable set A C X, 

P(A) = P((x € A: T"(x) € A for infinitely many n}). 
Proof For N €N, let 


—(xeA: {T"(x): n>N}CX\A}. 


Then T^"(B1) O Bı = Ø for all n > 1. Hence T "(Bi) Y T "(B4) = Ø for all 
m,n € N and n zz m. Thus 


1> 3 P(T-"(Bi)) = D> PCB) 


neN neN 


and P(B1) = 0. Replacing T with T proves that P(By) = 0. o 


Let Z denote the set of all invariant sets. Then Z is a o-field. The next result is the 
celebrated pointwise Birkhoff ergodic theorem. The proof given here follows [42] 
and goes back to Neveu. 


Theorem 4.42 iruan Ergodic Theorem) Let P be a T-invariant probability 
measure and let f € L! (P). Then f:= 2 Cf |Z) is (T, P)-invariant and 


1 n—l 
. i EN "i 
fim 1S porns 
i=0 
P-almost surely. In particular, if P is T-ergodic, then 


n—1 


sal TR" 
a E Y) 


P-almost surely. 


Proof For f € L'(P), set Sn(f)(x) = Y f o T (x) and f := E(f|D). We 
claim that 


f «0, P-almost surely — > lim sup —— oP) E 0, P-almost surely. 


n— oo 
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Let us first derive the theorem from the claim. For € > 0, set f; :— f — f — &. 
Then f; = —e < 0, and since f is (T, P)-invariant (the proof is easy and left to the 
reader), 


< 0. 


lim sup 
n— oo 


f — £ = lim sup 
n— oo 


S) s Ss 
n n 


Thus, £ being arbitrary, lim sup, oo S < f . Similarly, lim inf, , oo Self Ls f : 


We now move on to the proof of the claim. For n € N* and x € X, let 


F(x) :2 max{S(f)(x) : k =1,...,n}, 
Foo (x) := limno Fn (x) € RU {oo}, and A := {Fæ = oo]. Clearly 


S, 
lim sup 5G x0 
noo n 


on X \ A and it suffices to prove that P(A) = 0. Now observe that Fy; — F,o 
T = f — min(0, F, o T). Consequently, A € Z and (Fj4.1 — Fn o T) decreases to 
f — min(0, Fæ o T). In particular, by monotone convergence, limp— oo E((Fn41 — 
F,oT)l4) = E(fla) = Lf 14). By T-invariance of PP, the left-hand side is 
nonnegative. Hence, if Ô < 0, P-almost surely, then necessarily P(A) = 0. Oo 


The next theorem, known as the ergodic decomposition theorem, shows that 
every invariant measure on a Borel subset of a Polish space equipped with the Borel 
o-field can be written as a "sum" of ergodic measures. 


Theorem 4.43 (Ergodic Decomposition Theorem) Let M be a Borel subset of 
a Polish space, with Borel o-field B(M). Let T : M — M be a measurable 
transformation. Every T-invariant probability measure P can be decomposed as 


PC) = f P(x, +) P(dx), 
M 


where P is a Markov kernel on (M, B(M)) such that P (x, -) is ergodic for P-almost 
every x. 


Before proving the ergodic decomposition theorem, we state without proof a 
lemma that can be deduced from Theorem 10.2.2 in [21] and the monotone class 
theorem in the appendix. 


Lemma 4.44 Let M be a Borel subset of a Polish space, with Borel o -field B(M). 
Let P be a probability measure on (M, B(M)), and let A be a sub-o -field of 
B(M). Then there exists a Markov kernel P on (M, B(M)) such that for every 
f € B(M), Pf is a representative of E(f|.A), ie, Pf is A-measurable and 
(1a Pf) = (1A f) for every A € A. 
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Proof (Theorem 4.43) Recall that Z denotes the c -field of T-invariant sets in B(M). 
By Lemma 4.44, there is a Markov kernel P on (M, B(M)) such that for every 
f € B(M), Pf is a representative of E( f |Z). This yields 


P(A) = E( idt) = f P(x, A) P(dx), VA e B(M). 
M 


As a subset of a separable metric space, M is separable (see Exercise 4.45 (ii) 
below). Proposition 4.5 implies the existence of a countable family {fn}nen C 
Cp(M) such that for every u,v € P(M), u = v if and only if uf, = vf, 
for all n € N. For every n € N, Pf, is a representative of E(f,|Z), and 
x P P(x, THE = P(fa o T)(x) is a representative of E( f, o T|Z). Since 
P is T-invariant, we have IE( f, |Z) = E( fn o T|Z) for every n € N, hence P(x, -) is 
T -invariant for P-almost every x. 

To show that P(x, -) is ergodic for P-almost every x, we follow the proof of 
Theorem 6.2 in [24]. Since M is a separable metric space, the o-field B(M) is 
countably generated, i.e., there is a countable family of sets {An}nen such that 
B(M) = o (An : n € N) (see Exercise 4.45 (iii)). As a result, L' (M, B(M), P) is 
separable (see parts (i) and (ii) of Exercise 4.46 below). Since the set (14 : A € Z} 
is contained in L' (M, B(M), P), it is also separable in the L!-topology, so there is 
a countable family {An}nen C Z such that for every A € Z and for every € > 0, 
there is n € N with P(AAA;) < e. 

Let Zo := o (A, : n € N). By definition, Zo is a countably generated sub-o -field 
of T. Moreover, Zo and Z are P-equivalent, i.e., for every A € Z thereis B € Zo such 
that P(AA B) = 0 (see Exercise 4.46 (iii)). As Z need not be countably generated 
(see Exercise 4.48 below), we will work with Zo in the remainder of the proof. 
Applying Lemma 4.44 to Zo, we obtain a Markov kernel Q on (M, B(M)) such 
that for every f € B(M), Of is a representative of E( f |Zo). Let ( fa}nen C Co (M) 
be as above. For n € N, consider the function hn :— Qf,. Since Z and Zp are 
P-equivalent, E(f,|Z) = E(fn|Zo). As a result, there is M ! € B(M) such that 
P(M!) = 1 and for every x € M!, P(x,-) is T-invariant and 


h,(x) = P(x, fn, VneN. 


Hence, Q(x,-) = P(x, -) is T-invariant for every x € M K By Birkhoff’s ergodic 
theorem 4.42, there is M? C M! such that P(M?) = 1 and for every x € M?, 


N-1 
Jim x 2 fa(T*(x)) =Aa(x), YneN. 


And as both Q(., An) and 14, are representatives of E(14, |Zo), there is M 5c M? 
such that P(M?) — 1 and Q(x, An) = 14, (x) for every x € M? and n € N. Finally, 
as Q(-, M?) is a representative of Z (1,43 |Zo) and as P(M?) = 1, there is M^ c M? 
such that P(M^) — 1 and Q(x, M?) = 1 for every x € M^. 
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Let us show that Q(x, -) is ergodic for every x € M^, which will complete the 
proof of the ergodic decomposition theorem. Fix x € M^ and A c Z. In light of 
Proposition 4.36, it is enough to show that Q(x, A) € (0, 1}. If Q(x, A) = 0, we 
are done. If Q(x, A) > 0, consider the probability measure 


Bj LOA eB). 
Q(x, A) 


Since v(A) = 1, it suffices to show that Q(x, -) = v, which will follow from 
hrx) — vfa, Vn eN. (4.13) 


Set 


[x] := () A. 


A€Tog:xeA 
By Exercise 4.47 (i) below, one has 
E12 (| A^ N 45 (4.14) 
n:iX€Ag n:x$An 


and [x] € Zo. Fix n € N. Since hy is Zo-measurable, it is constant on the set [x] by 
Exercise 4.47 (ii). Therefore, we have for every y € [x] 1 M s: 


N-1 
hs (x) = hà) = dim $^ fT“). 
k=0 


Since x € M3, the representation of [x] in (4.14) implies Q(x,[x] = 1 and 
thus Q(x, [x] n M?) = 1. Since Q(x, -) is T-invariant, another application of 
Birkhoff's ergodic theorem then yields that the constant h,(x) is a representative 
of Egcx,.)(fnl\Z), where Eg ,,.) denotes expectation with respect to Q(x, -). Conse- 
quently, 


hr (x) Q(x, A) = Ego, Ahr (x)) = Ego 04 fn) = Í, l4(z)fn(z) Q(x, dz). 


Dividing both sides by Q(x, A) gives (4.13). oO 


Exercise 4.45 (Properties of Separable Metric Spaces) Let (M, d) be a separa- 
ble metric space. 


(i) Let D C M be countable and dense. Show that (B(x,r) : x e Dr € QV} 
is a basis for the topology on M, where B(x,r) stands for the open ball with 
center x and radius r. 
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(ii) Let A be any subset of M. Show that A with the metric induced from M is 
itself a separable metric space. 
(iii) Show that the Borel c -field 5(M) is countably generated. 


Exercise 4.46 For an arbitrary probability space (Q, JF, P), prove the following 
statements: 


(i) If F is countably generated, then (Q, 7, P) is separable, i.e., there is a 
countable family D C F such that for every A € F and e > O there is 
B e D with P(AAB) < e. 
(ii) If (Q, F, P) is separable, then L! (Q, F,P)isa separable metric space. 
(iii) If (Q, F, P) is separable, then for every A € F there is B € o (D) such that 
P(AAB) — 0. 


Exercise 4.47 Let (Q, F) be a measurable space, let {An}nen C F be a countable 
family of sets, and let A :— o (A, : n € N). For x € Q, set 


[x]4 :— () A. 


AcA:xeA 


(i) Show that for every x € Q, 


[x]A = () Ann () Ar. 


niX€ Ag n:x$An 


and deduce that [x] 4 € A. 
(ii) Let f : Q — IR be A-measurable and let x € Q. Show that f is constant on 


[x]A. 
The next exercise shows that Z, the o-field of T-invariant sets, need not be 
countably generated. 


Exercise 4.48 Consider the irrational rotation T, of Exercise 4.37 with o irrational. 
Let Z be the c -field of TJ,-invariant sets. Use the formula from Exercise 4.47 (i) to 
show that Z is not countably generated, even though B(S!) is. 


4.7 Application to Markov Chains 


Consider now the canonical chain introduced in Sect. 1.3, Proposition 1.8. Let O : 
MN — MP be the shift operator defined by O (c), :— «4, and let P, be the law 
of the canonical chain with initial distribution v and kernel P. Recall that P, is a 
probability measure over MN characterized by (1.3). 
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Proposition 4.49 


(i) 
(ti) 


(iii) 


P, is ©-invariant if and only if v € Inv(P). 
Let v € Inv(P) and let h € L'(P,) be (©, P,)-invariant. For x € M such that 
h € L! (P), let 


Tœ) := «00 = f hdp. 


Then 


(a) h(w) = h(wo), Py-almost surely; 
(b) h is (P, v)-invariant. 


P, is O-ergodic if and only if v is P-ergodic. 


Proof 


(i) 
i) 


(ii) 


This follows easily from the definitions. 

Let h € Ll(P,) be (©, P,)-invariant. For n € N, set hy :— Lv (h| Fn). 
By Doob’s martingale convergence theorem (Theorem A.7), hn converges 
P,-almost surely, hence in probability, to A. In particular, for all € > 
0, lim, oo P, (én+1 — An| > £) = 0. By (©, P,)-invariance of h and by 
the Markov property from Proposition 1.10, 


hn = Ey(ho 9" |Fn) = con (h) = h(@n). 


Thus, 
Py (Ant — ha| > €) = Py (alwn) — A(@n)| > £). (4.15) 


Since v € Inv(P), (i) implies that P, is O-invariant. The expression on the 
right-hand side of (4.15) thus equals P, (\h(@1) — h(wo)| > £), which proves 
that hy = ho = h. Also, by the Markov property, Ph(x) = Ex( ix, (h)) = 
x (Ex (h o O|F1)) = Ex (h o ©). And as h is (©, P,,)-invariant, we have for 
v-almost every x € M that E,(ho ©) = h(x). 
Let v be P-ergodic. We will show that every (©, P,,)-invariant function h € 
L!( P») is P,-almost surely constant. In particular, every (©, P,)-invariant 
set has IP, -measure 0 or 1, so P, is @-ergodic by Proposition 4.36. If h € 
L(P,) is (©, P,)-invariant, then = is v-almost surely constant by (ii) and 
P-ergodicity of v. By (ii), this proves that ^ is P,-almost surely constant. 
Conversely, assume that P, is O-ergodic. Let A be a (P, v)-invariant set. 
Set A := {w € MN : wp € A}. Then P, (An 8-1 (A)) = f, v(dx) P(x, A) = 
v(A) = P,(A). This shows that A is (©, P,)-invariant. Hence v(A) = 
P, (A) e (0, 1). 
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Theorem 4.50 Let P be a Markov kernel, u € Inv(P), and h € TUB. Then 
there exist a set N € B(M) and a function h € L! (w) such that u(N) = | and, for 
all x € N, 


1 n—l p _ 
Jim. — 5 h00 (w) = h(x) 


k=0 


P,-almost surely. If u is ergodic, then h(x) = ču (h). 


Proof By Birkhoff’s ergodic theorem, Da h o @*(w) converges P,,-almost 
surely to a (©, P,,)-invariant function heL! (P,,). According to Proposition 4.49 


(ii), h(w) = h(a), P,,-almost surely, where h(wo) :— Uw (1). To conclude the 
proof, we use the fact that P,,(-) = Su P«(Ju(dx). oO 


The next theorem is the ergocic decomposition theorem for a Markov kernel. 


Theorem 4.51 Let M be a Borel subset of a Polish space and let P be a Markov 
kernel on (M, B(M)). Every P-invariant probability measure u can be decomposed 
as 


AC) = Í, Q(x, -) u(dx), (4.16) 


where Q is a Markov kernel on (M, B(M)) such that Q(x,-) is P-ergodic for u- 
almost every x. 


Proof Let Z(P, m) be the collection of (P, jz)-invariant sets in B(M). In Exer- 
cise 4.52 below, you are asked to show that Z(P, m) is a o-field. By Lemma 4.44, 
there is a Markov kernel Q on (M, B(M)) such that for every f € B(M), Qf is 
a representative of E, (f |Z(P, w)), where E, denotes expectation with respect to 
u. In complete analogy to the proof of Theorem 4.43, this yields the representation 
in (4.16). 

It remains to show that Q(x,-) is P-ergodic for u-almost every x € M. Let 
(M, d) be a Polish space such that M is a Borel subset of M. The space MN 
equipped with the metric 


"m _dlowi, di) | 
ele, a) 2 2 TF dlœ, a) 


is Polish as well; the corresponding Borel o-field equals the product o-field 
B(M)®N. Thus, MN is a Borel subset of the Polish space M". By Proposition 4.49 
(i), the Markov measure P,, on (Mh, B(M)8h) is O-invariant. Hence, by the 
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ergodic decomposition theorem 4.43, there is a Markov kernel P on (M N B(M)8Sh) 
such that 


Pu) = f P(o, :) P, (do), 
MN 


and P(w, -) is O-ergodic for P,,-almost every o € M N Moreover, as seen in the 
proof of Theorem 4.43, P f is a representative of E, (f |Z) for every f € B(M N, 
where Z is the o-field of ©-invariant sets in B(M )eN We will now relate the 
Markov kernels Q and P by showing that P,,-almost surely, 


P(o, -) = Poto, C). 
Let (F,]ex C Ct (MP) such that for every P,Q e P(MN), P = Q if and only if 
PF, = QF, for all n € N. In Exercise 1.9, we introduced the canonical projections 
Jn: MN + M™!, ow, (cj )i—0,..... We use 79 to define the o -field 


J := ny (4): A € T(P, p). 


Claim: The o -fields J and Z are P,,-equivalent. 
Proof of the claim: Let S € Z and define g(x) := P,(S) and 


A:—(xeM: q(x) =I}. 
By Proposition 4.49 (ii)(b), g is (P, w)-invariant. Since every x € A such that 
g(x) = P(x) satisfies P(x, A) = 1, it follows with Exercise 4.52 (i) that A € 
Z(P, u), and hence m (A) € J. By Proposition 4.49 (ii)(a), 


1s(w) = g(@o) 


for P,,-almost every o. In particular, (wo) € (0, 1}, P,,-almost surely, so 


Hence, 
P,(SAng (A) = 0. 


Let us now fix a set S € J. Then there is A € Z(P, u) such that S = zg (A). Set 
5 := AN = am" a. (ATTY, A simple induction argument using A € T(P, u) 
implies that P,, Ga) = (A) for all n € N. Continuity of P, from above 
yields 


P,.(S) = u (A) = P,,(S). 
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Since Š C. S, it follows that P,(SAS) — 0. And in the proof of Proposition 4.36, it 
was shown that for every Se T(P, p) there is $ € Z such that Pu (SAS) = 0. This 
completes the proof of the claim. 

We now complete the proof of Theorem 4.51. Since Z and J are P,,-equivalent, 
we have for every n € N that the representatives of E, (Fn |Z) and the representatives 
of E,(F;|.7) are representatives of E,(Fnlo(Z, J)). The function PF, is a 
representative of E,,(F,|Z) and thus also of E, (Falo (Z, 7)). Let 


Fo := (ng (A): A € B(M)). 


For n € N, consider the functions 


F,:M—>R, xe EG) 


and 
Gn: M > R, oe F, (co). 


By the Markov property from Proposition 1.10, G; is a representative of 
iu (Fn| Fo). As a result, 


zu (Fal I) = bu tí GF | FOIT) = iu(Gn|ZJ). 


Next, observe that c > QF, (co) is a representative of E, (G,|.7), and thus also 
of IE, (Fn| J) and IE, (F5|o (Z, 7)). This shows that 


1=P, ([o e M" PEG) = QF.(]) 


= Pu (fo € MN : P(o, -) Fn = Po.) F.]) , 
and hence 
s, (foe ms ratas] = 1 

Let S € B(M)®N such that IP, (S) = 1 and for every o € S, P(w,-) = Pa(ay,.) 
and (o, -) is O-ergodic. By Proposition 4.49 (iii), Q (wọ, +) is P-ergodic for every 
w € S. Since $ e B(M ON and since zg is continuous, the set 279(S) is analytic (see 
Theorem 13.2.1 in [21]). Theorem 13.2.6 in [21] implies that there are A, N € 
B(M) and B C N such that u(N) = 0 and z9(S) = A U B. It follows that 

1 = P4(S) < Pu (rg (AU N)) = (AU N) < u(A) + u(N) = u(A), 


which completes the proof. o 
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Exercise 4.52 Let 
I(P,p):— (A € B(M): 14 = PC, A) nass.) 


be the collection of (P, j)-invariant sets in B(M). 
(i) Show that 


I(P,u)-—(A € B(M): x€A: P(x, AS) > O}) = Of. 


(ii) With the help of the representation in part (i), show that Z(P, u) is a o-field. 


Exercise 4.53 (Skew Product Chains) Let M, N be two metric spaces and 


T:MxN—N, 


Qc y) e TO) 


a measurable map. Let (Xn) be an M-valued Markov chain defined on some filtered 
probability space (Q, F, IF, P) and let Yọ € N be an Fo-measurable random 
variable. Consider the stochastic process (Y,,) defined by 


Yn+1 = Tx, (Yn). 


(i) Show that (X,,, Yn) is a Markov chain on (Q, F, F, P). 
(ii) Suppose u € (M) is an invariant probability measure for (X,) and v € 
P(M) is T,-invariant for all x € M. Show that u Q v is invariant for (X, , Yn). 
(iii) We suppose here that u is the unique invariant probability measure of (X; ). 


(a) Give an example where v is T,-ergodic, but u © v is not. 
(b) (inspired by Lemma 2.1 in [29]) Suppose that u G v is ergodic for (Xn, Yn) 
and that for all x € M, Ty is 1-Lipschitz, i.e., 


d(TxCy), Tx(z)) < dy, z) 


for all x € M, y,z € N. Show that for all f € L5(M x N), u-almost all 
x € M,andall y € supp(v), 


1 n 
P. ( im - Y f08.Y) = (UB nif) =1. 
k=1 


Deduce that, if supp(v) = N, then (Xn, Yn) is uniquely ergodic. 
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(iv) Using (iii) show that the map defined in Exercise 4.40 is uniquely ergodic. 
Deduce that for all 6 irrational, the sequence (n? B)n>1 is equidistributed on 
S!. Hint: Choose P — 2a. See [28], Corollaries 1.12 and 1.13. 


Exercise 4.54 (Markov Rotations) With the notation of the preceding Exer- 
cise 4.53, we assume here that M = (1,...,n], N = si (X4) is a Markov 
chain on M whose transition probability matrix K is irreducible, and that for all 
i € M,Ti(y) = y + aj for some oj € S, 

A circuit for K is a sequence (ij,...,ig) of d > 1 distinct points such that 
K (ig, ik+1) > Ofork = 1,...,d andig,, = i1. The purpose of this exercise is to 
show that the chain (Xn, Yn) is uniquely ergodic if and only if there exists a circuit 
(1, ..., ig) such that aj, + ...+ aj, is irrational. 


(i) (preliminary) Let D be a diagonal matrix whose entries 61, . . . , 0, are complex 
numbers having modulus 1. Consider the linear equation 


Ku= Du (4.17) 


with u € C". Assume that u € C" is a nonzero solution to (4.17). Show that: 


(a) |ui| = |u1| fori = 1,...,n; 

(b) Kij > 0 > uj = Oui; 

(c) For every circuit (i1, ..., id), Oi ...0;; = 1. 

Prove that there exists a nonzero solution to (4.17) if and only if for every 
circuit (/1,..., id), Oi, ...0j; = 1. 

(ii) Let u be the unique invariant probability measure of (X,) and f = 
(f... f) € Llu gA. Set fix) = Ouaezuj;(0e77* with 
zez lu; (|? < oo. Show that Pf = f if and only if Ku(k) = D'u(k) for 
all k € Z, where D is the diagonal matrix with entries eine gion and 


u(k) = (u; (k)) j=1,.....- Here P stands for the kernel of (Xn, Yn). 
(iii) Prove the desired result. 


4.8 Continuous Time: Invariant Probabilities for Markov 
Processes 


Let {P;}:>0 be a Markov semigroup on M, as defined in Sect. 1.5. A probability 
measure u € P(M) is called invariant for ( P;}:>0 if it is invariant for P;, for all 
t > 0,ie., uP, = pu, Vt > 0. As shown by the following simple example, being 
invariant for some P; is not sufficient to be invariant for ( P;};>0. 


Example 4.55 Consider the deterministic continuous-time rotation on M = R/Z, 
given by X? = (x +f) mod 1. The associated semigroup is given by P, f(x) = 
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f (X7). Its unique invariant probability measure is the uniform measure on M. 
However, for all k € N* and x € M, i a, Ó, i jk is invariant for Pj y. 


Nevertheless, existence of an invariant probability measure for some P; always 
implies existence of some invariant probability measure for ( P;};>0. 


Proposition 4.56 Suppose u is an invariant probability measure for Pr for some 
T > 0. Then 


1 T 
T» 7], iB ds 


is invariant for { P;}1>0- 


Proof For allr > O0 and f € B(M), 


T T T+r T 
1 [LPs Py f ds = f UPs+r f ds = f UP; f ds = f HPs f ds, 
0 0 d 3 


where the last equality follows from the fact that, by Pr-invariance, the map s — 
UWP, f is T -periodic. o 


We now introduce a Markov kernel whose invariant probability measures 
coincide with the invariant probability measures of { P;). This kernel is usually called 
the 1-resolvent (or simply the resolvent) of ( P;);»0. It is defined, for all f € B(M), 
as 


ar» f ePf at (4.18) 
0 


Proposition 4.57 A probability measure u is invariant for G if and only if it is 
invariant for { P:}1>0- 


Proof Suppose uG = jw. Then, for all f € B(M) and s > 0, 


CO oo 
uP, f = uGP, f = ef / e C9 p, s f(x) dt u(dx) = ef e" uP, f dr. 
M J0 sS 


This shows, by a simple bootstrap argument, that s ++ puP,f is C! and that 
dc IL Ps f |s=0 = 0. Thus 


d d d 
— P, = — P =0 S P. P =0 = 0. 
a (f di^ t+s f 520 ETE s (Pr f)|s=0 
This proves that u P, f = uf. The converse statement is obvious. o 


One of the main interests of Proposition 4.57 is that it allows to extend easily 
certain notions introduced for discrete-time chains to continuous-time processes. 
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For instance, an invariant probability measure for {P;};>0 is ergodic for {P;}r>0 
if it is ergodic for the Markov kernel G, as defined in Sect. 4.4. With such a 
definition, the results of Sect. 4.4 as well as the ergodic decomposition theorem 4.51 
apply. Another consequence is the continuous-time version of the ergodic theorem 
given below as Proposition 4.58. We first define the notion of a progressive 
process. A continuous-time process (X;);>0 defined on a filtered probability space 
(Q,F,F,P) is called progressively measurable (with respect to F), or simply 
progressive, if for all t > 0, the map (s, o) € [0, 1] x Qh Xs(@) € M is mea- 
surable with respect to 5([0, t]) & F;. A progressive process is obviously adapted. 
Conversely, an adapted process having right-continuous (or left-continuous) paths 
is progressive (see, e.g., [45] for a proof). 


Proposition 4.58 Suppose (X;)+>0 is a progressive Markov process with semigroup 
(Prjiso. Let U1, U2, ... be a sequence of independent identically distributed random 
variables having an exponential distribution with parameter | and independent of 
(X1)i»0. Set To = 0, Thai = Th + Unsi for n > 0, and Y, = Xr, forn > 0. Then 


(i) The process (Y4) is a Markov chain with kernel G; 
(ü) Forall f € B(M), 


1 of" j H 
lim — X;) ds — — Gf (Yk) =0 
mif FX) ds ups fq 


1—00 


almost surely, where [t] := max(z € Z:z < t}; 
(iii) In particular, if u is ergodic for ( P; }r>0 and Xo is distributed according to u, 
then 


"E 
lim — 
t—oo f 


t 
[ f (Xs) ds = uCf) 


almost surely. 
Proof 


(i) Let g,ho,...,hn € B(M). Set X4, = ((t,.... t) E R} ch Sb Ss... 
tn}. By Fubini’s theorem and the Markov property, 


E(g(Y¥n+1)ho(Yo) . . -An(Yn)) 


= f n E(8(Xmn+u)ho(Xo)hı (X4) ... ha (X5,)) e" D e^ dti ...dty 


n 


=| (f, E(P,g(X;, )ho(Xo)hi(X4) ...h,(X,,))e " du) e^ dt, ...dt, 
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i) 


(iii) 
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= f E(Gg(X;, )ho(Xo)i(Xi) .. -hn (X, ew” dti... dt, 
X, 


= E(Gg(Y;)ho(Yo) . . . hn (Yn)). 


Fix f € B(M) and let t = (tk)ķ>1 be a deterministic increasing sequence of 
positive numbers such that t, + oo and lim supp oo i re (fk+1 =t)? < oo. 
Let 


tn ue tk—tk—1 
Mi = | Ft (Xs) a - Y] Ps f (Xn) ds, 
0 EL 


with the convention that tg = 0. Then the sequence (Mt) o is a martingale 
with respect to {F;, }n>o such that (M)n41—(M)n < (tn41 —t5)^|| f IIZ,- Thus, 
by the strong law of large numbers for martingales (see Theorem A.8), 


t 
lim — 20 
n—coo n 


almost surely. 
Let now Xo; = {t € RY :0 < ti X f...) be equipped with its Borel 


o-field and let v denote the law of (Ta)n>1. By what precedes, for v-almost 
Mt (o) 
n 


every t € Yoo, one has lim; oo = 0 for P-almost every w € Q. Thus, 
by Fubini’s theorem, the convergence of M!(w)/n to 0 holds for v & P-almost 
every (t, w) € Loo x Q. 

The sequence (M/)n>0 defined as 


n Tk—Tk—1 
M=}. ( [ P; f (Yy.1) ds — Grai.) 


k=1 


is a martingale with respect to the filtration {Gy }n>0, where Gn = o ((Yx, Tx) : 
O < k < n). Hence, relying again on the strong law of large numbers 
for martingales, limy—o0 M; n = 0 almost surely. Since lim;..o5 T,/n = 
E(71) = 1 holds P-almost surely, the desired convergence follows. 

If u is ergodic for (P;) and Xo = Yo has law m, then 


n—1 


Jim. — 5 Gf) = nGf = uf 
k=0 


almost surely by application of Theorem 4.50. 
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Notes 


The proof of the ergodic decomposition theorem 4.51 for a Markov kernel is taken 
from unpublished lecture notes by Yuri Bakhtin [4]. 


Chapter 5 ff) 
Irreducibility od 


This chapter discusses different versions of irreducibility. The first one, called 
&-irreducibility, or simply irreducibility, is a purely measure-theoretic notion 
which generalizes the definition given for countable state spaces. An important 
characteristic of irreducible chains is that they have at most one invariant probability 
measure. Another notion, this time topological, is that the chain is indecomposable, 
in the sense that there exists at least one accessible point, i.e., a point whose 
every neighborhood has a positive probability of being touched by the chain 
regardless of the initial condition. Indecomposability is not a sufficient condition 
to ensure unique ergodicity for a Feller chain but it is for a strong Feller chain. 
Moreover, the accessible set provides valuable information about the support of 
invariant probability measures. The final section of the chapter introduces a weaker 
condition than the strong Feller condition due to Hairer and Mattingly, called 
the asymptotic strong Feller property and studies the structure of the ergodic 
measures for chains satisfying this condition. For an asymptotic strong Feller chain, 
the ergodic measures have disjoint support. On a connected space, an invariant 
probability measure with full support is necessarily unique. 


5.1 Resolvent and é-Irreducibility 


Given a (nonzero) Borel measure £ on M, P is called &-irreducible if for every 
Borel set A C M and every x e M 


&£(A) > 0 > 3k > 0, P*(x, A) > 0. 
Equivalently, 


£(A) > 0 2 Ra(x, A) > 0, 
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where R,(., .) is the resolvent kernel defined as 


Ra(x, A) = (1 — a) Yat P* (x, A) 
kz0 


for some 0 « a « 1. 
Remarks 5.1 


(i) Let (Xn) be a Markov chain with kernel P and (A,) a sequence of i.i.d. 
random variables independent from (X,,) having a geometric distribution with 
parameter a, i.e., 


P(A; =k) 2a*(0-a).,keN; 


Then R, is the kernel of the sampled chain Y, = Xz, with 


n 
p AS 
i-l 


(ii) P and Ra have the same invariant probability measures; 
(iii) If P is &-irreducible, then for all n € N,x € M, and A € B(M) such that 
&(A) > 0 there exists k > n such that Pk (x, A) > 0. 


Exercise 5.2 


(i) Check the assertions of the preceding remark. 
(ii) Using the notation of Remark 5.1, show that for all m € N*, Tm has a negative 
bimomial distribution with parameters (a, m), i.e., 


P(5,—k)- (tnr! )éa-ar 
ies 


for all k € N. Let Y = X7,,,. Show that (Y/"), is a Markov chain with kernel 

R” 

H 
Example 5.3 (Doeblin Condition) Suppose that, for some nonzero measure 


&, Ra(x, A) > &(A) for all x € M and A € B(M). Then P is é-irreducible. 


Example 5.4 (Countable Chains) If M is countable and P is irreducible in the usual 
sense (see Chap. 2), then it is £-irreducible for £ = » 7, ôx. 


Theorem 5.5 Suppose that P is &-irreducible. Then P admits at most one invariant 
probability measure. 


Proof The assumption implies that £ is absolutely continuous with respect to every 
invariant probability measure, but since distinct ergodic measures are mutually 
singular (Proposition 4.29), there is at most one such probability measure. If M 
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is a Borel subset of a Polish space, the ergodic decomposition theorem 4.51 implies 
the result. 

For a general M (which does not even have to be a metric space but just a 
measurable set), we cannot rely on ergodic decomposition but can proceed as 
follows. Let us first observe that any two invariant probability measures u, v are 
equivalent, i.e., their null sets coincide. Indeed, by Lemma 4.26, the singular part 
of v with respect to yz is either 0 or a nonzero invariant measure. The latter case 
is impossible because € is absolutely continuous with respect to any invariant 
probability measure. Thus, v = hu with h € L!(j). As shown in the proof of 
Proposition 4.29 (ii), for all a > 0 the measure ua = (h A a)n is also invariant. 
Thus (a — h ^ a)n is either O or invariant. In the first case, u({h > a}) = 1. In the 
second case, u({h > a}) = 0 because (a — h Aa) and p, both being invariant, are 
equivalent. This proves that h is j,-almost surely constant. Thus u = v. o 


5.2 The Accessible Set 


With the exception of a few particular cases (such as Examples 5.3 and 5.4) it is 
in general not an easy task to verify that a Markov chain is €-irreducible. A purely 
topological notion of irreducibility is defined below. Combined with the existence 
of certain points satisfying a local Doeblin condition (see Chap. 6), this will ensure 
& -irreducibility. 

Recall that the (topological) support of a measure u is the closed set SUPP(/) 
defined as the intersection of all closed sets F C M such that u(M \ F) = 0. It 
enjoys the following properties: 


(a) u(M \ supp(u)) = 0; 
(b) x € Supp(z) if and only if u(0O) > 0 for every open set O containing x. 


Exercise 5.6 Prove that assertions (a), (b) above hold in any separable metric 
space. Use the fact that such a space has a countable basis of open sets (see 
Exercise 4.45 (i)). 


We define the set of points that are accessible from x € M (for P) as 
Py = supp(Ra(x, -)). 


Equivalently, y is accessible from x if for every neighborhood U of y there exists 
k > 0 such that P^(x, U) > 0. 

For C C M, we let Tc = NxecIy denote the set of points that are accessible 
from C and I := Ty the set of accessible points. Note that Tc is a closed (but 
possibly empty) set. We say that P is (topologically) indecomposableif T Z Ø. 
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Remark 5.7 If P is &-irreducible, then it is indecomposable and 


supp(é) C T. 


The converse implication is false in general (see Theorem 5.5 and Remark 5.10) but 
true for strong Feller chains (see Proposition 5.17). 


Proposition 5.8 Assume P is Feller and topologically indecomposable. Then 


(i) P(x,T)=I1forallx ET; 
(ii) T C supp(u) for all u € Inv(P); 
(iii) If T has nonempty interior, supp(u) = V for all u € Inv( P); 
(iv) If V is compact, there exists u € Inv(P) such that supp(u) = T; 
(v) If T is compact and g : T — R is a continuous and harmonic function on V 
(i.e., Pg(x) = g(x) for all x € T), then g is constant. 


Proof 


(i) Let x € I. It is enough to prove that supp(P(x,.)) C T. Let x* e 
Supp(P(x,.)) and O an open set containing x*. Then 6 :— P(x,O) > 
0. By Feller continuity and the Portmanteau theorem 4.1, V := {y € 
M : P(y, O) > 6/2} is an open set containing x. Letz € M and k € N be 
such that P*(z, V) > 0 (recall that x € I). Then 


ó 
p 0) > I P*(z, dy) P(y, O)> Te V) >0. 
V 


This proves that x* € T. 

(ii) Let x € T, U a neighborhood of x, and u an invariant probability measure. 
Then w(U) = f u(dy)R(y, U) > 0. 

(iii) By invariance, (D) = fp w(dx)R(x, V) + fpe u(dx) R(x, T), and since, by 
(i), R(x, T) = 1 for all x € T, it follows that Jre (dx) R(x, D) = 0. If 
furthermore I’ has nonempty interior, then R(x, T) > O for all x, so that 
(T°) = 0. This proves that supp(w) C T. 

(iv) By (i), Feller continuity, and Theorem 4.20, there exists an invariant proba- 
bility measure u with u(T) = 1; hence the result. 

(v) By (i) we can assume without loss of generality that T = M. By compactness, 
accessibility, and Feller continuity, for every open set O C M there exists a 
finite cover of M by open sets U;,..., Ux, integers nj,..., nj, and ó > 0 
such that P”! (x, O) > 6 for all x €e Uj; 1 < i < k. Thus P,(ro > n) < 
(1 — 8) for n = max(nı,..., nk), hence Py(tg > kn) < (1— &)* by the 
Markov property. Thus P, (tg < oo) = 1. The assumption that g is harmonic 
makes (g(X,)) a bounded martingale. It then converges PP, -almost surely. If g 
is nonconstant, there exist a « b such that (g < a} and (g > b} are nonempty 
open sets, and, by what precedes, (X,) visits infinitely often these sets Px- 
almost surely, a contradiction. 
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Remark 5.9 The inclusion Il C supp(jz) does not require Feller continuity. 


Remark 5.10 The inclusion T C supp(u) may be strict when I has empty 
interior as shown by the following exercise. Other examples where the inclusion 
T C Supp (iz) is strict can be found in [8] and [9]. 


Exercise 5.11 Let F : (0, 1} x [0, 1] — [0, 1] be the map defined by 
F(0, x) 2 ax, F(1, x) 2 bx(1 — x), 


where 0 < a < land 1 < b < 4. Let (X5) be the Markov chain on [0, 1] defined 
by Xn41 = F(0541, Xn), Xo = x > 0, where (0,) is an i.i.d. Bernoulli sequence 
with distribution (1 — p)d9 + pd; for some 0 < p < 1. Show that I = {0} and that 
when (1 — p)loga + plogb > 0, there exists an invariant probability measure u 
such that u ({0}) = 0, hence supp(u) ZT. 


In case P is uniquely ergodic on a compact set, it is topologically indecompos- 
able. 


Proposition 5.12 Suppose M is compact, P is Feller and uniquely ergodic with 
Inv(P) = {u}. Then P is indecomposable and T = supp(u). 


Proof By Proposition 5.8 it suffices to prove that I is nonempty. By Theorem 4.20, 
i 2 Pk (x, -) => u for all x € M. Hence, for any open set O such that w(O) > 
0, lim inf, o5 i P*(x, O) > 0. Thus R(x, O) > 0. oO 


A partial converse to Proposition 5.12 is the following result. Recall that L,(M) 
is the set of real-valued bounded Lipschitz functions on M. 


Proposition 5.13 Assume that M is compact, P is Feller, T has nonempty interior, 
and for all f € Lp(M) the sequence (P" f),-1 is equicontinuous. Then P is 
uniquely ergodic. 


Proof By equicontinuity of (P" f)n>1, the sequence (f,,)n>1 defined by 


= i PT 
Ín = ye 


is also equicontinuous, hence relatively compact in C,(M) by the Arzela-Ascoli 
theorem. Let g be a limit point of (f,,)n>1. Then g is continuous and Pg = g. 
By Proposition 5.8 (v), g|r is a constant Cy. Let now u and v be two invariant 
probability measures. Then wPf = uf implies that u(f„) = (f). Therefore 
u(f) = ug) = u(glr) = Cry. Similarly v( f) = Cy. This proves that u =v. oO 


Exercise 5.14 Deduce from Proposition 5.13 that the irrational rotation Ty (see 
Exercise 4.37) is uniquely ergodic. 
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Exercise 5.15 Let M be a compact space. Using the notation of Chap. 3 and 
Sect. 4.5.1, consider the Markov chain on M recursively defined by 


Xn+1 = Fe, (Xn). 


Assume that © is a metric space, (0, x) — F(x) is continuous, and for each 0 € ©, 
Fo is Lipschitz with Lipschitz constant /o. Assume furthermore that 


(i) f lg m(d0) < 1 (compare with the condition of Theorem 4.31); 
(ii) For every x € M and every open set O C M, there exists a sequence 01, ..., On 
with 6; € supp(m), 1 < i < n, such that fo, o... fa, (x) € O. 


Show that (X,,) is uniquely ergodic. 


Remark 5.16 It is important to emphasize here that the condition that I’ has 
nonempty interior is not sufficient to ensure uniqueness of the invariant probability 
measure. For instance, Furstenberg, in a remarkable work [29] (see also [48]), has 
shown that for a convenient choice of o € R\Q and f a smooth map on S! := R/Z, 
the diffeomorphism 


T : S! x S! E x S!, 
(x, y) e Gcr a, y + BG) 


is minimal (i.e., all the orbits are dense) but not uniquely ergodic. 

Another example is given by the Ising model on Z?. This is a Feller Markov 
process on the compact set M = {-l, yZ for which all points are accessible 
(i.e., T = M) and which admits (at low temperature) several invariant probability 
measures. See Example 2.3 in [33] for a discussion and further references. 


Recall that a function f : M — R is lower semicontinuous (respectively, upper 
semicontinuous) at a point xo € M if 


f(xo) x liminf f(x), resp. f (xo) > lim sup f (x). 
x xo x—> xo 


Clearly, f is continuous at a point xọ € X if and only if f is both upper and lower 
semicontinuous at xo. 


Proposition 5.17 Suppose that P is topologically indecomposable and that for 
some x* € T and all A € B(M), x> P(x, A) is lower semicontinuous at x*. Then 
P is &-irreducible for & = P(x*, .). In particular P admits at most one invariant 
probability measure. 


Proof Let A be such that P(x*, A) > 0. Then for all x € M there exist a 
neighborhood O of x* and n > O such that P"(x, O) > O and P(y, A) > 0 
for all y € O (by lower semicontinuity of x œ> P(x, A) at x*). Thus P^*!14(x) > 
fo P" (x, dy) P(y, A) > 0. a 
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Note that the assumption that x +» P(x, A) is lower semicontinuous at x* is 
automatically satisfied if P is strong Feller. Hence Proposition 5.17 gives a practical 
tool to ensure that a strong Feller chain is uniquely ergodic. Another result about 
strong Feller chains is the following. 


Proposition 5.18 Suppose that P is strong Feller. Then 


(i) Two distinct ergodic measures have disjoint support; 
(ü) The support of an invariant non-ergodic probability measure is disconnected; 
(üi) If M is connected and P has an invariant probability measure having full 
support, then P is uniquely ergodic. 


Proof 


(i) Let u,v be two distinct ergodic measures. By Proposition 4.29 they are 
mutually singular. Hence there exists a Borel set A C M such that (A) = 1 
and v(A) = 0. The set (x € M : P(x, A) = 1} is closed (strong Feller 
property) and has jz-measure | because 1 = u(A) = f n (dx) P(x, A). Thus 
supp(u) C {x € M : P(x, A) = 1}. Similarly supp(v) C (x € M : 
P(xX, \ A) = MH. 

(ii) Let u be invariant and let A be such that P14 = 14, -almost surely, and 0 < 
u(A) < 1. Set f = P14. Then f(x) € (0, 1} for u-almost every x and, by 
the strong Feller property, f is continuous. Thus f restricted to supp(u) takes 
values in (0, 1}. If now supp(jz) is connected, then f restricted to supp(u) is 
constant and u (A) € (0, 1}. (ii) follows from (ii). 


5.2.1 Continuous Time: Accessibility 


For a continuous-time semigroup ( P;];»o one defines, by analogy with the discrete- 
time setting, the set of points that are accessible from x € M (for {P;};>0) as 


Fx = supp(G(x,-)), 


where G is the 1-resolvent (see Eq. (4.18)). 


Proposition 5.19 Suppose {P;}:>0 is weakly Feller and let x, y € M. Then the 
following assertions are equivalent: 


(i) The point y is accessible from x for { P; }r>0; 
(ü) The point y is accessible from x for G; 
(iii) For every neighborhood U of y there exists t > 0 such that Pj (x, U) > 0. 
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Proof Clearly (i) — (ii) and (ii) — (iii) because 
G*(f) = Í vet) Pr f dt, 


where yg (t) = "ml DI To prove that (iii) = (i) suppose that P, (x, U) > 0 for 
some f > 0. By the weak Feller property, Ps4;(x, -) => P;(x,-) as s | 0. Thus, by 
the Portmanteau theorem 4.1, lim inf; |o Pr; (x, U) > 0. Hence G(x, U) > 0. a 


5.3 The Asymptotic Strong Feller Property 


The asymptotic strong Feller property was introduced in [34] by Hairer and 
Mattingly to prove uniqueness for the invariant probability measure of the Navier- 
Stokes equation on the two-dimensional torus, subject to degenerate stochastic 
forcing. Before we define this property, we introduce some notation. 

Let (M, d*) be a separable metric space, with (M) the space of probability 
measures on (M, B(M)). One important idea in this section is to consider a whole 
family of metrics on M, but throughout, d* will be the metric that gives rise to the 
topology on M, and in particular induces the o-field B(M). 

For any bounded metric d on M, we let Lip;(d) denote the set of B(M)- 
measurable functions $ : M — R such that 


Ip) -¢O)| <d, y), Vx,y eM. 


Notice that Lip; (d) contains all constant functions. If the metric d is continuous 
with respect to the topology induced by d* and if Bg(M) denotes the Borel c -field 
with respect to d, then Lip, (d) is equal to the set of 5;(M)-measurable functions 
o : M — R such that |G (x) — $(y)| < d(x, y) for all x, y e M. For u, v e P(M), 
we define 


lu — vlla := sup (uo — vo). 
Lip, (4) 


Boundedness of d guarantees that every function in Lip; (d) is bounded and thus 
integrable with respect to any Borel probability measure on M. 


Exercise 5.20 Let d* be bounded. Show that (u,v) e |lu — vlla» defines a 
bounded metric on P (M). 


Remark 5.21 If (x, y) := 1x#y is the discrete metric, then 


lu — vla = 414 — v| := 4 sup{luf — vf1: f € B(M), Hf los < 1}, 
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where |u — v| is the so-called total variation distance between jz and v. The latter 
will play a key role in Chap. 8. 


We call a metric d on M continuous if itis continuous as a function from M x M 
to [0, oo), where M x M has the topology induced by the product metric (d* x 
d*)((x, y), (x', y) :9 d*(x, x^) + d*(y, y). Notice in particular that d* itself is 
continuous. A sequence of metrics (dn)n>1 on M is called nondecreasing if for 
every n € N*, 


di A (X, y) = dn(x, y), Vx,y € M. 


Recall that 5(x, y) :— 1:4, and that ôy is the Dirac measure that assigns mass 1 to 


(x). 


Definition 5.22 (Hairer, Mattingly) We say that a Markov kernel P on M is 
asymptotically strong Feller at x € M if there exist a nondecreasing sequence 
(nk)k>1 Of positive integers and a nondecreasing sequence (d;)x>1 of continuous 
metrics on M such that 


lim dg(y,z) = (y, z), Vy,zeM, 
k—oo 


and 


inf į lim sup sup ||; P"* — 8, P” lj, : Uopen, x € U 4 =0. 
k—oo yeU 

We call P asymptotically strong Feller if it is asymptotically strong Feller at every 

x € M. 


Since (dj)x>1 is nondecreasing and converges to a bounded metric, each metric 
dx is, of course, bounded. 


5.3.1 Strong Feller Implies Asymptotic Strong Feller 


In this subsection, we show that every strong Feller Markov kernel also has the 
asymptotic strong Feller property. The proof of this statement makes use of the ultra 
Feller property, which we now define. A Markov kernel P on M is called ultra 
Feller if the mapping x +> 6,P is continuous with respect to the total variation 
distance (see Remark 5.21). In particular, every ultra Feller Markov kernel is strong 
Feller. The following statement corresponds to Theorem 1.6.6 in [32]. It is due to 
Dellacherie and Meyer, see [18]. 


Proposition 5.23 Let P and Q be strong Feller Markov kernels on M. Then the 
Markov kernel P Q is ultra Feller. 
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The proof of Proposition 5.23 we present here is taken from [32]. It is an 
adaptation of an argument due to Seidler. We begin by stating two lemmas. 


Lemma 5.24 Let P be a strong Feller Markov kernel on M. Then there exists rt € 
P(M) such that P(x,-) & m for every x € M. 


Proof Since M is separable, there is a dense sequence (xn)n>1 of elements of M. 
We define the probability measure 


z(A):— 3 T POS A), AecB(M). 


n=1 


To obtain a contradiction, assume there is x € M such that P(x, -) is not absolutely 
continuous with respect to x. Then there is A € B(M) such that z (A) = 0 and 
P(x, A) > 0. Let f := 14 € B(M). Since P is strong Feller, Pf is continuous. We 
have Pf (x) = P(x, A) > 0. Since z (A) = 0, we have 0 = P(x,, A) = Pf (xn) 
for every n € N*. But then continuity of Pf and the fact that (x,) is dense in M 
imply that Pf = 0, a contradiction. o 


The following real-analysis lemma corresponds to Corollary 1.6.3 in [32]. Recall 
from the proof of Lemma 4.44 in Sect. 4.6 that a o-field F is called countably 
generated if there exists a countable family of sets {An}nen such that F = o (An : 
n € N). 


Lemma 5.25 Let (Q, .F, s) be a measure space such that F is countably gen- 
erated. Let ($n) be a bounded sequence in LY (Q, F, z). Then there exist a 
subsequence (on, )k>1 and @ € LY (Q, F, 1) such that 


Jim. f coron mds) =f (ofc) ma, vf e L'(Q,,m). 


Proof The space LY (Q, F, 7x) being the dual of L'(Q, JF, 7), its unit ball is 
compact for the weak* topology by the Banach-Alaoglu theorem. Furthermore, 
the assumption that F is countably generated makes L!(Q, F, 2) separable (see 
Exercise 4.46). Thus, the unit ball of L! (Q, F, 2) is sequentially compact for the 
weak* topology. This proves the result. o 


We proceed to the proof of Proposition 5.23. 


Proof (Proposition 5.23) Since Q is strong Feller, Lemma 5.24 yields existence of 
a probability measure z on (M, B(M)) such that Q(x, -) « z for every x € M. To 
obtain a contradiction, suppose that the kernel P Q is not ultra Feller. Then there are 
x € M and e > O such that for every open neighborhood U of x, 


sup [34 PO — dyPQ|l5 > €. 
yeU 
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For r > O and y € M, let B,(y) := {z € M : d*(y,z) < r} be the open d*-ball of 
radius r centered at y. Then for every n € N* there is y, € By /n(x) such that 


|àx PQ — ôy, PQlls > €. 
According to Remark 5.21, 


sup (PO$(x) - POP(Yn)) > 28, Yn e N*, 
$€B(M):|IbllooS1 


where the expression on the left-hand side denotes the total variation distance 


between ôx P Q and 6,, PQ. As a result, there is a sequence (¢)n>1 in B(M) such 
that ||¢@n|loo 1 and 


PQdn(x) - PQdn(Yn) > 2e, Vne N*. (5.1) 
Since M is a separable metric space, Exercise 4.45 (ii) implies that the o- 
field B(M) is countably generated. And since ($,) is a bounded sequence in 


L™(M, B(M), z), Lemma 5.25 implies that there exist a subsequence (@n,)x>1 
and a function $ € L° (M, B(M), x) such that 


im, f encor man f éco reo xx), Vf € L' (M, B(M), x). 


Since Q(x,-) « x for every x € M, we have that for every x € M there is 
hy € L! (M, B(M), x) with Q(x, dy) = hx (y) z (dy). Then, for every x € M, 


im Q6.) = QO). 


To keep notation short, set yy :— Qdn, for every k € N*, and set y := Qg. We 
also introduce the functions (0j) ;>1 defined by 


pj(x) = sup| ve) —w@)|, xeM, 
a 


and note that lim joo pj (x) = 0 for every x € M. For every k > 1, 


IWilloo < lno <1 and llelo < lllo ue IIVilloo < IlPlloo + 1, 
2 


so bounded convergence implies that 


Jim Pox) = Pya) (5.2) 
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and 

a. Ppj(x) =0 
for every x € M. For every m € N*, 


lim sup Pp; (nj) < lim sup Pom (nj) = Ppm(x) 


joo joo 


because (oj) is a nonincreasing sequence of nonnegative functions in B(M), 
lim joo Yn; = x, and P is strong Feller. Since the estimate above holds for every 
m € N* and since limo Pom(x) = 0, it follows that 


lim Ppj(yn;) = 0. (5.3) 
joo 
Consequently, 
lim sup (P Qn, (x) — P Qóu, (yn,)) 
k—oo 
< lim sup| Pg (x) — Py (x)| 
k—oo 
+ lim sup| PY (x) — Pw(yn,)| + lim sup| P Y (Yn) — P Yk (Yny) 
k—oo k—oo 
< lim sup Ppx(yn,) = 0, 
k—oo 


where we used (5.2), the assumption that P is strong Feller, and (5.3). This 
contradicts (5.1). oO 


We are now ready to state and prove the main result of this subsection. 


Proposition 5.26 Let P be a Markov kernel on a separable metric space (M, d*). 
If P is strong Feller, then it is also asymptotically strong Feller. 


Proof Consider the sequence of continuous metrics 
dk (x,y) = l^(kd*(x, y), keN', 


where a ^b denotes the minimum of a and b. The sequence is clearly nondecreasing, 
and 


lim d(x, y) = d(x, y), Vx, y e M. 
k—oo 
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If P is strong Feller, then Proposition 5.23 implies that P? is ultra Feller. Therefore 


yeU 


0 = inf | sup 3; P? — 8,P?|5 : U open, x € jJ. (5.4) 


Since (dx)x>1 is nondecreasing and converges pointwise to ô, the sequence of 
functions f(y) := ||8, P? — dy P? ll is nondecreasing and dominated by f(y) := 
là, P? — oy p* l5. Thus, for every open neighborhood U of x, 


lim sup sup fx(y) < uD lim fr) < up fo). 


k—oo yeU 


Together with (5.4) and ng := 2 for all k > 1, this yields 


inf į lim sup sup ||8y P"* — 8, P”: lla, : U open, x € 7 =0. 


k=œ yeU 


oO 


Remark 5.27 If P is a Markov kernel on a separable metric space such that P" is 
strong Feller for some n € N*, then P is asymptotically strong Feller. This follows 
if one replaces P? in the proof of Proposition 5.26 with P7". 


The following exercise shows that the converse of Proposition 5.26 does not hold, 
i.e., there are Markov kernels which are asymptotically strong Feller but not strong 
Feller. 


Exercise 5.28 Consider the mapping 
F: R > R?, (x1, X2) > (xa, x1). 


For (x,0) € R? x R, set Fo(x) :— F(x) + 0ei, where ej :— (1,0)! (cf. 
Exercise 6.11 (ii) in Sect. 6.2). Let m be a probability measure on (R, B(R)) that is 
absolutely continuous with respect to Lebesgue measure. 


(i) Show that the Markov kernel P corresponding to the random dynamical 
system (F, m) is not strong Feller. Hint: Consider for instance the function 
f Ga. x2) :2 1520. 

(ii) Use the result from Exercise 6.10 in Sect. 6.2 to show that P? is strong Feller, 
and conclude that P is asymptotically strong Feller. 
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5.3.2 A Sufficient Condition for the Asymptotic Strong Feller 
Property 


Throughout Sect. 5.3.2, let H be a separable real Hilbert space with norm || - ||, 
and let || f [oo :— sup,cgul f (x)| for f € BCA), the set of real-valued bounded 
Borel-measurable functions on H. 


Definition 5.29 A function f : H — R is called Fréchet differentiable at a point 


x € H if there exists a bounded linear operator A : H — R such that 


|fG +h) = fe) = Ahl _ 


m 0. 
\|n||>0 Il% || 


The operator A is uniquely defined by the above condition, and it is called the 
Fréchet derivative of f at the point x. 


Let F (H) denote the space of bounded functions f : H — R that are Fréchet 
differentiable and whose Fréchet derivative V f satisfies the following conditions: 


(i) 


IV flloo = sup sup |Vf(x)h| < oo; 
x€H heH:|h| x1 


(ii) The mapping x > V f (x)h is continuous for every h € H. 
The following statement is a special case of Proposition 3.12 in [34]. 


Theorem 5.30 (Hairer, Mattingly) Let P be a Markov kernel on (H, B(H)). 
Assume that there exist constants a € (0,1) and C > O such that for every 
f € F(H), one has Pf € F(H) and 


IV Pfllos < Cll flloo + etll V f lloc. (5.5) 


Then P is asymptotically strong Feller. 
Proof Consider the sequence of continuous metrics 

di, y) = 1^ (a "P |x — yl), kKeN*. 
Similarly to the sequence of metrics defined in the proof of Proposition 5.26, (dx)x>1 
is nondecreasing and converges pointwise to the discrete metric 6. Fix k € N* and 
$ € Lip, (di). As explained in Remark 5.31 below, there exists a sequence (¢n)n>1 


in F(H) N Lip, (dx) such that 


lim n(x) = $x), Wx € H. 
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For x e H and n € N*, set 


$x) := $()— sup p(y) and $4Q):— és (x) — sup Pn Y). 
yeH yeH 


Notice that |||]; = sup, cH (y) — infyeg (y) < 1 because ¢ € Lip, (dx) and 


dy < 1. Similarly, Ibn lloc < 1 for every n € N*. Since dn and à, only differ by a 
constant, we also have ¢, € F(H)n Lip, (dx) for every n € N*. It is then not hard 
to see that 


lVóslloo < «7*2, Yn eN*. 


Now, fix x, y € H and define 
y(s):=(1—s)y+sx, s €[0, 1]. 


By assumption, Pon € F (H) for every n € N*. By the chain rule for the Fréchet 
derivative, the function Pky o y is differentiable with 


(Pi, o yy (s) = VPFobn(y(s))(x — y), Vs € (0, 1). 


Since the expression on the right-hand side is continuous in s, one obtains with the 
fundamental theorem of calculus 


P¥ on (x) — P¥dn(y) = P*ó,(y (1) — Pbn(y (0) 
1 
=f V P (y) — y) a A 


Iteratively applying the estimate in (5.5), one has 


k-1 


V.P ollas < C Y 2o nllo + o IV bn loo: 
j=0 


Since ||bn|loo < 1 and || Voplloo < «^"^, this yields 


Pon (x) — P o (y) € c(C, olx — yll, 


where c(C,o) :— auo + C/(1 — a). Letting n — oo, we have by bounded 
convergence 


P*ó(x) — P*O) € c(C, @)|lx — yll. 
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As this estimate holds for all $ € Lip, (dx), 


là; PX — 8, P^lla, < c(C, a) ||x — yl]. 


Now, for € > 0 fixed, let U be the open || - ||-ball of radius € centered at x. For any 
zeUu, 


là, PX — 8, P^la, < c(C, ae, 


hence 
int fim sup sup ||ôx pr = 8; Pla, : U open, x € 7 x c(C, a)e. 
k—oo zeU 
Since € was arbitrarily chosen, P is asymptotically strong Feller. o 


Remark 5.31 The proof of Theorem 5.30 uses the following approximation result: 
For every @ € Lip, (dy) there exists a sequence (n)n>1 in F (H) N Lip, (dx) that 
converges pointwise to $. To see this, let (ej) je z be a complete orthonormal system 
in H, where either 7 = N* or J = (1,..., N} for some N e N*. Fort > 0, define 
the bounded linear operator 


A(t): H—^H,xe > e^ (x, ej)ej, 
JET 


where (-, -) denotes the inner product on H. The collection of operators (A (t))+>0 
is a Co-semigroup on H, and ||A(t)|lop < e™ for all t > 0. For t > 0, let 


t 
Q:: H > A, xe f A(2s)x ds, (5.6) 
0 


where the integral is to be interpreted as a Bochner integral. It is not hard to see that 
Q; is of trace class, so there is a well-defined Gaussian measure u; on (H, B(H)) 
with mean 0 and covariance operator Q;. For n € N*, define 


$nx) :— f saam ty) u(dy, xeH. 


It is not hard to check that A(t)(H) C Ql" (H) for every t > 0. Then, by 


Theorem 2.1 in [56], $,, has Fréchet derivatives of any order, and all derivatives 
and the function itself are bounded. In particular, 6, € F(H) for all n e N*. 
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For n € N* and x, y € H, one has 
Ida X) — du GI < [ ieu 42) - (AC my +1 2) 
<f dy(A(1/n)x + z, ACL/my + z) udo) < de, y). 
H 


Finally, the pointwise convergence of (n)n>1 to $ follows from Proposition 6.2 
in [14]. 


5.3.3 Unique Ergodicity of Asymptotic Strong Feller Chains 


The following theorem, first shown in [34], provides an important justification for 
introducing the asymptotic strong Feller property. It can be seen as a strengthening 
of Proposition 5.18 (i) for Polish spaces. 


Theorem 5.32 (Hairer, Mattingly) Let (M, d*) bea Polish space, i.e., a complete 
and separable metric space, and let P be a Markov kernel on (M, B(M)). Let p, v 
be ergodic measures with respect to P. If P is asymptotically strong Feller at a point 
x € supp(u) N suppOG), then u = v. In particular, if P is asymptotically strong 
Feller, then two distinct ergodic measures have disjoint support. 


The proof of Theorem 5.32 requires several tools we yet need to introduce. We 
therefore postpone it to the end of this subsection. Let (X, e) be an arbitrary metric 
space and let u, v € P(X). A coupling of u and v is a probability measure F on 
(X?, B(X) & B(X)) such that 


T(Ax X)=u(A), T(X x A)=v(A), VA € BX). 


We denote by C(u, v) the set of couplings of u and v. 


Exercise 5.33 Assume in addition that X is separable and let P(X 7) be the set of 
Borel probability measures on X?, endowed with the topology of weak convergence. 
Show that for every u, v € P(X), C(u, v) is a closed subset of P(X), 


The following exercise explores the concept of lower semicontinuity. Given a 
metric space (X, e), a function f : X — R is called lower semicontinuous if 
f (xo) < lim infy-. xo f(x) for every xo € X. 


Exercise 5.34 Let f : X — [0, oo) be a function. 


(i) Show that 


f(x) := inf{ f(y) +e, y) : y € XJ 


defines a continuous function on X. 
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(ii) Show that f is lower semicontinuous if and only if there exists a nondecreasing 
sequence (fn)n>1 of continuous functions from X to [0, oo) that converges 
pointwise to f. Hint: Consider the functions f, (x) :— inf( f (y) + ne(x, y) : 
y € X},n € N*, and use part (i). 

The following statement, cited without proof here, can be found in [68] (see 
Particular Case 5.16 of Theorem 5.10 for the formula and Theorem 4.1 for 
existence of a minimizing coupling). It is an instance of the famous Kantorovich- 
Rubinstein duality theorem. The term duality refers to the asserted equivalence of a 
maximization and a minimization problem. 


Theorem 5.35 Let (M,d*) be a Polish space and let d be a bounded metric on 
M that is lower semicontinuous as a function from the product metric space (M x 
M, d* x d*) to [0, 00). Then, for every u, v € P(M), we have 


i-e. ur J aay) Tax, a» 
TeC (u,v) J M2 


and the infimum on the right-hand side is attained. 


Remark 5.36 Let (M, d*) be a Polish space. The Wasserstein distance of order 1 
between u, v € P(M) is defined as 


Wide dut f een rats. dy). 
TeC(u,v) J M2 


In light of Theorem 5.35, if d* is bounded, then 
Wı (u, v) = lu — vlla, Yu, v € P(M). 
Exercise 5.20 shows that in this case, W; is a bounded metric on (M). One can 


show that the metric space (P(M), Wi) is Polish as well (see, e.g., Theorem 6.18 
in [68]). 


Lemma 5.37 Let (M,d*) be a Polish space, let (dn)n>1 be a nondecreasing 
sequence of continuous metrics on M, and let d be a bounded metric on M such 
that 

lim dy (x, y) = d(x, y), Vx,y € M. 

n—oo 
Then, for every u, v € P(M), we have 


lim |u — vlla, = lu — vlla. 
n— oo 
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Proof Let u,v € P(M). Since (dn)n>1 is nondecreasing and since d is bounded, 
we have 


lu — vlla, <lu — vlla... € lle — vlla «oo, Vn eN*. 
Therefore, 
1:— dim |j — vlla, 
n—oo 


exists and is less than or equal to ||u — v||g. By Theorem 5.35, there are couplings 
(Ta)n>1 of u and v such that 


le — vlla, = f date.) Tn (dx, dy), Vn € N*. 
M 


Since u and v are Borel probability measures on a Polish space, they are tight by 
Prohorov's theorem 4.13, i.e., for every € > O0 there is a compact set K C M 
such that w(K), v(K) > 1 — e. Hence, by Exercise 5.38 below, the family of 
couplings (L5)5»1 is tight as well. Again by Prohorov's theorem, (L';),» 1 admits a 
subsequence that converges weakly to a probability measure Tœ% € P(M?). And by 
Exercise 5.33, Too € C(u, v). For simplicity, we denote the convergent subsequence 
again by (U5)5»1. For n < m, we have 


Jen uet an = f dx (5,9) To (dx; dy) = |l — vla, eL 
M2 M2 


Since each d, is continuous and bounded, and since I’, converges weakly to Moo, 
we have 


lim d, (x, y) Un(dx,dy) = I d, (x, y) V'oc(dx, dy). 
M2 M2 


m-—»oo 


Thus, 


f d, (x, y) oc(dx, dy) < l. 
M? 


By monotone convergence, 


1> f d(x, 9) Too(dx, dy) > _ inf J di 39 Fils, dy: (5.7) 
M2 TeC(u,v) J M2 


Since d is the pointwise limit of a nondecreasing sequence of continuous func- 
tions, Exercise 5.34 implies that d is lower semicontinuous. Hence, by virtue of 
Theorem 5.35, the expression on the right-hand side of (5.7) equals ||u — v||g. We 
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have thus shown that / > ||u — v|[g, and together with / < || — v||g one obtains 
lim, oc lu — vlla, = li — vlla. o 


Exercise 5.38 Let (X, e) bea metric space and let u, v € P(X) be tight. Show that 
C(u, v) C P(X”) is a tight family of probability measures. 


Lemma 5.39 Let (M, d*) be a separable metric space, let P be a Markov kernel 
on (M, B(M)), and let d be a metric on M that is bounded by 1. Assume further 
that there are € > O and U € B(M) such that 


sup ||ó4 P — ó6,P|a < e. 
x,yeU i 


Let u, v € P(M) and set a := u(U) ^ v(U). Then 
[gp P =vP las 1- a(l -— e). 


Proof Since d is bounded by 1, we have ||, P — vP||ag < 1, so the assertion holds 
if a = 0. If œ > 0, define for A € B(M) the Borel probability measures 


Gene) Hey AOU) 

(U) — WU) ' 
-ca LIAC - qu (A) «ay YA) — av" (A) 
MTS tae HART qon ^| 


and observe that 


u= — o) t au", 


v 2(1— o)» tar", 


Let @ € Lip, (d). Exercise 5.40 below and the fact that uU (US) = wY (US) = 0 
yield 


(u^ P)o — (7 P)ó = f (x P)6 — By PIG) u” (dx) v” (dy) 
< [WP - Pla i o v (dy) < e 
U 


Taking the supremum over Lip, (d) gives 


Iu" P — v" Pla x e. 
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The triangle inequality for || - ||; then implies 
IP — vPla < d — oDIIRP — Pla + ole’ P — v" Pla < 1—o + ae. 


(mi 


Exercise 5.40 Let (M, d*) be a separable metric space, let P be a Markov kernel on 
(M, B(M)), and let d be a bounded metric on M. Show that for every u, v € P(M), 
T € C(u, v), and $ € Lip, (d), one has 


(uP) — (P)o = l: (Cr P)6 — (6yP)) T (dx, dy). 


We are now ready to prove Theorem 5.32. 


Proof (Theorem 5.32) Let x € supp(u) N supp(v) such that P is asymptotically 
strong Feller at x. Then there exist a nondecreasing sequence (nx)x>1 of positive 
integers as well as a nondecreasing sequence (dj), of continuous metrics on M 
such that limo d (y, z) = 9(y, z), y. z e M, and 


inf, lim sup sup ||ôx P"* — ôy P"*|;, = 0. 


xE ^ k—oo yeU 


Uopen 


Let U be an open neighborhood of x and let K € N such that 


1 
sup 8. P^* E bP lla a> 


, WkK> K. 
yeU 4 


Since || - ||a satisfies the triangle inequality for every metric d on (M, d*), we have 


1 
sup |8,P"* —6,P™ lla <=, Vk K. 
y»zeU ` 2 


Seta := u(U) ^ v(U). Lemma 5.39 implies 


a 
uP” — vP” lla «1— y VRE K. 
Since u and v are invariant probability measures, 
a 
lu —vila<1-3, Vee K. 
AS 


lim |j — vila = lle — vlla 
k—oo 
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by Lemma 5.37, it follows that 
| ls1-7 
—v — —. 
Hu 8€ 2 


Since x € supp(u) n supp(), we have a > 0, so ||u — v||s < 1. In particular, for 
every A € B(M), 


2 > |n (14 — Lac) — vla — 149)] = 2IA(A) — v(A)| 


in view of Remark 5.21. This implies that jz and v are not mutually singular. Since 
u and v are ergodic, it follows from Proposition 4.29 (ii) that u = v. o 


For a Markov kernel P, let Erg(P) denote the set of P-ergodic measures. From 
the proof of Theorem 5.32, one obtains the following corollary. 


Corollary 5.41 Let M be a Polish space and let P be asymptotically strong Feller 
at a point x € M. Then there exist a neighborhood U of x and an ergodic measure 
v such that x (U) = 0 for every n € Erg(P) V {v}. 


Proof Suppose the statement does not hold. Then for every neighborhood U of x 
there are at least two distinct vj, v € Erg(P) such that vi (U), vo(U) > 0. As in 
the proof of Theorem 5.32, one then shows existence of distinct vj, v2 € Erg(P) 
that are not mutually singular, which contradicts Proposition 4.29. o 


In the following proposition, we exploit Theorem 5.32 and its corollary to further 
elucidate the structure of Erg(P) under the asymptotic strong Feller property. In 
particular, we obtain a counterpart of Proposition 5.18 (iii). 


Proposition 5.42 Let M be a Polish space and let P be asymptotically strong 
Feller. 


(i) The set Erg(P) is countable, and for every P-invariant probability measure u 
one has 


w= D> vOnQQ). 


veErg(P) 


where X (v) = {x € M : Q(x, Supp v) = 1} and Q is the Markov kernel from 
Theorem 4.51; 
(ü) If P has an invariant probability measure having full support, then 


* 


M= |J suppv. 
veErg(P) 


where the asterisk indicates that Supp vi N supp v» = Ø for distinct v1, v2 € 
Erg(P); 
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(iii) If P has an invariant probability measure u having full support and if M is 
connected, then Erg( P) is either countably infinite or Erg( P) = {u}; 

(iv) Suppose that P has an invariant probability measure u having full support. 
Assume in addition that for every € > 0 there exists a connected compact set 
K C M such that w(K) > 1 — £. Then Erg( P) = {u}. 


Remark 5.43 The condition from part (iv) that for every € > 0 there is a connected 
compact set K C M with w(K) > 1 — e clearly holds if M is connected and 
compact. But it also holds, for instance, if M is a separable Banach space or, more 
generally, a separable Fréchet space. By Fréchet space we mean a locally convex 
topological vector space whose topology is induced by a complete metric d that 
satisfies d(x + z, y + z) = d(x, y) for every x, y,z € M. Indeed, since Borel 
probability measures on a Polish space are tight, for every € > 0 there is a compact 
set K C M such that mee ) > 1 — e. Let K be the closure of the convex hull of 
K. Then u(K) > 1 — e€ and K is convex, hence connected. By Theorem 3.20 (c) 
in [61], K is also compact as the closure of the convex hull of a compact set in a 
Fréchet space. 


The following lemma is used in the proof of Proposition 5.42. 


Lemma 5.44 Let M be a Polish space and let P asymptotically strong Feller at a 
point x € M. If there is an invariant probability measure u such that x € supp n, 
then x € supp v for some v € Erg(P). 


Proof By Corollary 5.41, there are a neighborhood U, of x and v € Erg(P) such 
that zz(U,) = 0 for every m € Erg(P) \ (v). To see that x € supp v, fix a 
neighborhood U of x. Then U N U; is also a neighborhood of x. Since x € supp p, 
one has u(U N Ux) > 0. By the ergodic decomposition theorem 4.51, there is a 
Markov kernel Q such that Q(y, -) e Erg(P) for u-almost every y € M, and 


0 < u(U N Ux) = Í, Q (y, U N Ux) u(dy). 


Hence, there is y € M such that Q(y,-) € Erg(P) and Q(y, U N Ux) > O. It 
follows that Q(y, Ux) > 0, so Q(y, -) = v. Consequently, 


vU) = Q(y, U) = Q(y, Un Ux) > 0. 


Proof (Proposition 5.42) 


(i) Since M is separable, so is its subset S := UseErgcP) Supp v (see Exer- 
cise 4.45 (i)). Let D be countable and dense in S. By Theorem 5.32, the 
supports of distinct P-ergodic measures are disjoint. To show that Erg( P) is 
countable, it is enough to prove that for every v € Erg(P) there is x € D 
with x ¢ L);egrgpy p) SUPP m. Let v € Erg(P) and let y € supp v. By 
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(ii) 
(iii) 


(iv) 
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Corollary 5.41, there is an open neighborhood U of y such that zz (U) = 0 for 
every x € Erg(P) \ (v). Since y € S and since D is dense in S, there is a 
point x € DNU. As U is a neighborhood of x, one has x ¢ supp x for every 
x € Erg(P) \ (v). 

By Theorem 5.32, X (v) Y X (x) = Ø whenever v, x € Erg(P) are distinct. 
Let 


X 2 (xeM: O(x,-) e Erg(P)}. 


Theorem 4.51 implies that there is B € B(M) such that B C X and u(B) = 
1. Since XY c UveErgcP) X (v), one has 


Bc |) x 
veErg(P) 
and hence 
1 = p(B) = „(2 n U xw) - »( U œn xo»). 
veErg(P) veErg(P) 


With Theorem 4.51, this yields for every A € B(M) 


&&-[ocn»do- Y, [. 0c»d4». G5 


veErg(P) nX(v) 


For v e Erg(P), let x e B N X(v). Since x € B C X, we have 
Q(x,-) € Erg(P). And since x € X (v), we have Q(x, supp v) = 1. Then, by 
Theorem 5.32, Q(x, supp m) = 0 for every z € Erg(P) \ {v}. In particular, 
Q(x, -) = v. Thus, the expression on the right-hand side of (5.8) equals 


f unm Hdx)- YP (ABO XW) = YP VAXO). 
NX (v 


veErg(P) veErg(P) veErg(P) 


This follows immediately from Lemma 5.44 and Theorem 5.32. 

In light of part (i) and the ergodic decomposition theorem, all we need to show 
is that if Erg(P) is finite, then it has cardinality 1. Suppose that Erg(P) = 
{v1,..., Vn}, where n is a positive integer and v1, ..., Vn are pairwise distinct. 
By part (ii), M is a finite and disjoint union of nonempty closed sets. As M 
is connected, this is only possible if n = 1. 

The following claim will be proved later on: 


Claim 1 If K C M is connected and compact, then there is v € Erg(P) such that 
z (K) = 0 for every zx € Erg(P) V {v}. 
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We use Claim 1 now to show 


Claim 2 There are S € B(M) and v € Erg(P) such that (S) = 1 and x(S) = 0 
for every zx € Erg(P) V {v}. 


By assumption, for every integer n > 2 there is a connected compact set K, C M 
such that w(K,) > 1 — 1, Set 


Sis Uka. 


n>2 


Then, for every m > 2, 
uS) > w(Km) > 1— 4, 


which implies w(S) = 1. 

Claim 1 implies that for every n > 2 there is v; € Erg(P) such that zz (K5) = 0 
for every x € Erg(P) \ {vn}. Set v := vo. To show that zx: (S) = O for every 
z € Erg(P) \ (v), it is then sufficient to prove v, = v for every n > 2. Suppose 
this is not the case. Then there is n > 2 such that v, Æ v. Since m(Kn) = O for 
every x € Erg(P) \ {vn} and z (K2) = 0 for every x € Erg(P) \ {v}, we have in 
particular z (Kn N K2) = 0 for every x € Erg( P). On the other hand, 


1 1 
A(K5 N K2) = u(Kn) + u (K2) — “(Kn U K2) > 2— m 2 1>0. 


Hence, by part (i), 


O< UKn AK) = J (Kn Knu(XG)) — 0, 
z eErg(P) 


a contradiction. This completes the proof of Claim 2. 


Let S and v be as stipulated in Claim 2. By the formula in part (i), 


124(2 P; nz(Su(XG)) = v(S)u(XQ). 


zeErg(P) 


In particular, i, (X (v)) = 1. Since X(z) N X (y) = Ø for distinct 7, y. € Erg(P), 
this yields u = v, so u € Erg(P). As supp u = M and as distinct P-ergodic 
measures have disjoint supports, one obtains Erg( P) = {u}. 

Finally we need to prove Claim 1. By Corollary 5.41, we can associate every 
y € K with an open neighborhood U, of y and v, € Erg(P) such that (Uy) = 0 
for every m € Erg(P) \ {vy}. Since K is compact, there are finitely many 
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yr ---, Yn € K such that 
n 
K c |] Uy. (5.9) 
k=1 


To simplify notation, we write v; instead of vy, from now on. Let z € Erg(P) \ 
[vi, ..., Vn}. Then 


m(K) < Yo xy) = 0. 


k=1 


To prove Claim 1, it remains to show that vj = ... = vy. For 1 < i < n, let 
F; := supp v; N K. As the intersection of two closed sets, each set F; is closed. 
Besides, F; is nonempty: Clearly y; € K, and since z(U,,) = 0 for every x € 
Erg(P) \ (vi), part (ii) yields y; € supp v;. Together with (5.9), part (ii) also 
implies K — LE | Fi. Moreover, F; O F; = Ø if v; # vj. Connectedness of K then 
yields vj =... = Vp. a 


Notes 


The notion of &-irreducibility introduced at the beginning of Sect. 5.1 is called g- 
irreducibility in [49]. For the resolvent kernel Ra, Meyn and Tweedie [49] use the 
notation K,,, where & € (0, 1) corresponds to our parameter a. Section 4.5 of [49] 
contains additional information, some of it bibliographic, on the use of irreducibility 
in the study of Markov chains. 

The original definition of the asymptotic strong Feller property in [34] is for 
Markov semigroups (P;), where t > 0 is a continuous-time parameter. Translating 
the definition as well as the results of Hairer and Mattingly to the discrete- 
time setting is straightforward. Furthermore, the nondecreasing sequence (dx)x>1 
converging to ô is allowed to consist of pseudometrics in [34], i.e., the distances 
between distinct points need not be strictly positive. 

Most of the material in Sect. 5.3 is taken from [34], sometimes with small 
adaptations (in particular, Proposition 5.26, Theorem 5.30, and Theorem 5.32 along 
with their proofs, including Lemmas 5.37 and 5.39). As far as we know, the 
statements in Proposition 5.42 have not been published elsewhere. 

In the Kantorovich—Rubinstein duality theorem 5.35, the boundedness assump- 
tion on the metric d can be relaxed, see [68]. If d* is unbounded, the Wasserstein 
distance W, defined in Remark 5.36 is still a metric on 


Pi(M) = fu € P(M): Í, d*(x, y) u(dy) < o}, 


the so-called Wasserstein space of order 1. Notice that the choice of x in the 
definition of Pı (M) is arbitrary. 


Chapter 6 A) 
Petite Sets and Doeblin Points Geek for 


Often, the &-irreducibility property, as defined in Chap. 5, can be deduced from 
the existence of an accessible point satisfying a local Doeblin condition. These 
conditions turn out to be very useful tools when dealing with specific models such 
as random dynamical systems, processes obtained by random switching between 
deterministic differential equations, or stochastic differential equations. For these 
models, the accessibility condition can be rewritten as a deterministic control 
problem and the local Doeblin conditions can be deduced from more “computable” 
conditions such as—for the last two models—certain Hórmander-type conditions. 
This chapter develops these ideas in detail. 


6.1 Petite Sets, Small Sets, Doeblin Points 


We call a measurable set C a petite set if there exist a € (0, 1) and some nonzero 
Borel measure £ on M such that 


Ra(x, A) > &(A) 


for all x € C and A € B(M). We call the set C a small set if there is a nonzero 
Borel measure £ on M such that 


P(x, A) > &(A) 


for all x € C and A € B(M). Clearly, every small set is petite. 
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Remark 6.1 In the terminology of Meyn and Tweedie [49] (Chapter 5), a vy-petite 
set for a probability measure a on N is a set C € B(M) such that 


Pages A) > va(A), Wx eC, Ae B(M), 
n=0 


where v, is some nonzero Borel measure on M. A vm-small set for m € N* is a set 
C € B(M) such that 


P"(x A) 2v,(A) Vx€C, Ae B(M), 


where vm is a nonzero Borel measure on M. With these definitions, the class of 
petite sets defined above is equal to the class of sets that are vA, -petite for some 
a € (0, 1), where 


Ag(k) := a*(1 — a), k € N. 


Our notion of a small set corresponds to the notion of a vj-small set. 


We call a point x* € M a weak Doeblin point (respectively a Doeblin point) if 
x* has a neighborhood that is a petite set (respectively a small set). 

The importance and usefulness of these notions will be highlighted in Chaps. 7 
and 8. Here we mainly focus on weak Doeblin points. The following proposition 
extends Example 5.3. It provides a powerful tool to ensure unique ergodicity. 


Theorem 6.2 Assume that there exists an accessible weak Doeblin point for P. 
Then P is &-irreducible. In particular, by Theorem 5.5, it has at most one invariant 
probability measure. 


Proof By assumption, there exists an open set C and a nontrivial measure £ such 
that CNT z Ø and Ra (x, -) > £C) for all x € C. Let pk = Y g(1 — a)2aía*-! = 
(k + 1)(1 — a?a*. Then, for all A measurable and x € M, 


Yo pe PE (x, A) = Ri(x, A) = [ Rats. aio A) > Ra(x, C)E(A). 
k-0 


By accessibility, R;(x, C) > 0. oO 


6.1.1 Continuous Time: Doeblin Points for Markov Processes 


Let {P;};>0 be a continuous-time Markov semigroup. Recall (see Sect. 5.2.1) that 
a point p € M is called accessible for {P;};>9 provided that it is accessible for 
the 1-resolvent G, or equivalently, G(x, U) > 0 for every x € M and for every 
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neighborhood U of p. The following proposition is a useful tool whose proof is 
based on ideas borrowed from [6] and [10]. 


Proposition 6.3 Let {P;};>0 be a continuous-time weak Feller semigroup. Assume 
that there exists a point p € M which is accessible for (P,);29 and which is a 
Doeblin point for some Pr, with To > 0. Then the following statements hold: 


(i) There exist q € M (which can be chosen arbitrarily close to p) and Ti > To 
such that for all T > Ti, q is an accessible Doeblin point for Pr; 

(ii) If for some s > Q there exists an invariant probability measure u for Ps, then 
u is the unique invariant probability measure of P; for all t > 0. 


Remark 6.4 Proposition 6.3 is clearly false in discrete time. Let M = (0, 1} and 


10 
accessible for P?. 

The proposition also fails to hold if we replace the condition that p is a Doeblin 
point for some Pr, by the weaker condition that it is a Doeblin point for G. To see 
this, let ( P;) be the semigroup induced by the rotation x — (x +t) mod 1 on R/Z 
(see Example 4.55). Then every point p € M = IR/Z is accessible and a Doeblin 
point for G, but not accessible for P4 when o is rational. 


= E J . The point 0 is an accessible Doeblin point (take € = 51) but is not 


Proof of Proposition 6.3 By assumption there exists a neighborhood U of p and a 
nontrivial measure é such that for all x € U 


Pr (x, +) = £C). 

Lemma 6.5 There exist To > To, € > 0, and a measure ¢ such that ¿ (U) > 0 and 
Pix, +) 2 £C) 

forall x € U and Tj € t € Tj +e. 


Proof By accessibility, £G(U) — i e !& P,(U) dt > 0. Thus, for some tọ > 0, 
EP (U) > 0. Set &’ = £ P4. Then £'(U) > 0 and for all x € U 


ôx Pry +10 = ôx Pm, Pro zt E. 


By Fatou's lemma, weak Feller continuity and the Portmanteau theorem 4.1, 


lime PU) = f lim inf P, Gc, U) &(dx) = f Ly (x) £'(dx) = £(U) > 0. 
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Thus, for some ô > 0 and € > 0, 
& P,(U) > ô 


for all 0 < s < e. Set Ty = 2(To + to). Then, for all x € U and0 < s < e, 


ôx Prits = bx Pra Protto+s = E Pro+to+s = Jj £' Ps (dy) Pry a y, )z ó£'. 
U 


This proves the lemma with ¢ = 5&’. 

We now prove the first part of Proposition 6.3. Set Ti = TIUS. Let T > Ti. 
Then T can be written as T = k(Tj + s) with 0 < s < e and k € N*. Thus, for all 
x eu, 

Pr(x,-) = & Pr = 8s Pr, = UY t. 
This proves that every point x € U is a Doeblin point for Pr. Choose now q € UN 
supp(¢). Let x € M. By accessibility, there exists ty > 0 such that P;, (x, U) > 0. 


For k, m € N sufficiently large, there exists t € [To, T; + €] such that t -- kt = mT. 
Then, for every neighborhood V of q, 


P? (x, V) = Py (x, V) = PB, (x, U£(Y > 0. 


This proves that q is accessible for Pr. 

The second assertion follows from Theorem 6.2. Suppose that jx is invariant for 
some P,. Then v — 1 Jo IL P, du is invariant for ( P;) by Proposition 4.56. Thus, v 
is the unique invariant probability measure of P, for t > Tı. The same is true for 
0 < t < Tı because, kt > T; for some k € N. oO 


6.2 Random Dynamical Systems 


Let © be a nonempty open subset of R? and m a probability measure on (©, B(G)). 
Let M be a nonempty open subset of R^ and F : © x M — M a Cl-mapping. 
Recall from Chap.3 that the pair (F, m) induces a random dynamical system with 
associated Feller Markov kernel 


P(x,G) 2 m((0 € GO: Fo(x)e GD, (,G)e M x B(M). 
For n € N* and x € M, let 
Qn.x : O” > M, (01,...,0,) > (Fo, o...0 Fo, )(x). 


The following proposition is essentially Lemma 6.3 in [8]. 
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Proposition 6.6 Let x* € M, n € N*, and 0* = (01,...,05) € ©” such that the 
following conditions hold. 


(a) The Jacobian matrix Dqy x*(0)|o—o* has maximal rank (i.e., rank k); 

(b) There is a neighborhood V C ©” of 0* such that m" (- N V) is absolutely 
continuous with respect to A" (. N V), where A”? is the Lebesgue measure on 
IR". The corresponding probability density function p is such that 


c := inf o(0) > O. 
0cV 


Under these conditions, x* is a Doeblin point with respect to the Markov kernel P”, 
and in particular a weak Doeblin point with respect to P. 


Remark 6.7 Condition (b) above holds true whenever m is absolutely continuous 
with respect to A7 with a lower semicontinuous and positive density. 


Proof Since Dy x«(0)|g—o* is a (k x nd)-matrix of rank k, we have either k = nd 
or k « nd. To avoid repeating ourselves, we will only prove the slightly more 
complicated case k « nd. The case k — nd can be easily derived by making small 
modifications to the proof for k < nd. Assume without loss of generality that the 
first k columns of Do, ,«(0)|g—e* are linearly independent. We will often write 
points 0 € ©” as 9 = (0, g41—0). where 6 e RF is the vector consisting 
of the first k components of 0, and where 0"4—? is the vector of the remaining 
(nd — k) components. For x € M, consider the C!-mapping 


Gx : 9" > M x R", g = 0, 01-0) > (o, (0), 0%). 
We also define the C!-mapping 
H:0"xM>MxR"* x M, (0,x) (Gx (0), x). 
Since 


det DH (0, x) |g=0*,x=x* = det DG x+ (0)|o—o* 


= det Dow Pn x+ (0®, (9*) 0479), («yo #0, 


the inverse function theorem implies that there is an open neighborhood W 
of (0*,x*) such that the restriction of H to W, denoted by Hw, is a C!- 
diffeomorphism. By intersecting W with an open subset of V x M that contains 
(0*, x*) and calling the resulting set W again, we may assume without loss of 
generality that 0 € V for every (0, x) € W. The set H(W) is a neighborhood 
of H (0*, x*) = (qy x» (0*), (0*) "49, x*), so there are open neighborhoods Zo of 
Qn.x* (0*), To of (0*)4—. and Uo of x* such that Zo x To x Uo C H(W). Let 
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Wo := Hy (Zo x To x Up). For x € Uo, set 
V, := {0 € 0" : (0, x) € Wo}. 
It is straightforward to check that for every x € Uo, the restriction of Gy to Vy isa 


C | diffeomorphism that satisfies G,(V;) = Zo x To. 
Let x € Up and A € B(M). We have 


P"(x, A) > P'(x, AN Zo) = f m" (d0). 
Pr (ANZo) 


Since G7! ((A N Zo) x To) C Dn x (A N Zo), the expression on the right-hand side 
is bounded from below by 


I m" (d0) > / m" (d0). 
Gz! (ANZ) xTo) Vs Gz! ((ANZo) x To) 


As V, C V, the integral on the right-hand side equals 


p(0) A" (d0) > c Í 1*4 (d0). 


= ((ANZo) x To) VNG (ANZ) x To) 


There is no loss of generality in assuming that V and Up are each contained in a 
compact set. Since the mapping (0, x) — det DG (0) is continuous, we have 


C:= sup |det DG,(0)| < oo. 
0€V,xeUo 
Hence, 
n € nd 
P'(x,A)zz [det DG, (0)| A"* (d0). 
€ JV NG! ((ANZo)x To) 


Since the restriction of G, to V, is a diffeomorphism, the change of variables 
formula (see for instance Theorem 2.47 in [27]) implies that the expression on the 
right-hand side equals 


© And-k CA CA Y Zo). 
C 


The measure &(A) := anak (To) v4 (AN Zo) on (M, B(M)) is nontrivial and does 
not depend on x € Uo, so Uo is a small set with respect to the kernel P". As Uo is a 
neighborhood of x*, the point x* is a Doeblin point with respect to P". o 


The next theorem, Theorem 6.9, summarizes the consequences of Proposition 6.6 
in case x* is accessible. It is first useful to rephrase the accessibility condition for 
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the class of Markov chains considered here (i.e., induced by a random dynamical 
system). 


Proposition 6.8 A point y € M is accessible from x € M if and only if for every 
neighborhood U of y there exists a finite sequence 01,...,04, with 0; € supp(m) 
for all i such that Fg, o ... o Fg, (x) € U. 


Proof This easily follows from the definitions and the continuity of 0 — Fọ(x). 
oO 


Recall that a point y € M is called accessible provided that it is accessible from 
every x € M. As usual, we let I' denote the accessible set, i.e., the set of accessible 
points. 


Theorem 6.9 Assume that there exists an accessible point x* € M for which the 
assumptions of Proposition 6.6 hold. Then T has nonempty interior, P has at most 
one invariant probability measure u, and SUPP({L) =T provided that u exists. 

Assume in addition that for every 0 € ©, Fs is a diffeomorphism from M onto 
Fa(M). Then T = Int(T) and u, when it exists, is absolutely continuous with 
respect to the Lebesgue measure on R*. 


Proof By Proposition 6.6 and Theorem 6.2, P is é-irreducible. Then supp(&) C 
T. The proof of Proposition 6.6 shows that & is (up to a multiplicative factor) the 
Lebesgue measure on some open subset of R^. Therefore its support has nonempty 
interior. Uniqueness of u, when it exists, follows from Theorem 6.2 and the equality 
supp(u) = FT follows from Proposition 5.8 (iii) (bearing in mind that P is Feller 
and é-irreducible, hence indecomposable). 

If for all 0 € ©, Fo is a diffeomorphism, the set 


U U Fo, 0...0 Fo, (Int(T)) 


n>1 (01,...,05)esupp(m)" 


is open and dense (by Proposition 6.8) in T. 

The proof of absolute continuity goes as follows. Let u be the invariant 
probability measure and write its Lebesgue decomposition y = uac- Hs, where Hac 
is absolutely continuous with respect to A* (written pac « A^) and us is singular. 
Since £ « A‘ and £ < u, the absolutely continuous part Hac is nonzero. For all 
A € B(M), 


Bac P (A) = [ Mac( Fg | (A)) m(d0). 


This shows that pac P. « A, because whenever A^ (A) = 0 then ak (Ez! (A)) = 0. 
Thus, by uniqueness of the Lebesgue decomposition, the equality UgeP + Hs P = 
Mac + Hs implies that UgeP(-) < Mac(-). Thus MD is an excessive probability 
measure, hence invariant. By uniqueness of the invariant probability measure, y = 


AED SO u « AF. L1 
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Example 6.10 (Additive Noise) Recall the setting of Exercise 3.4. We have M — 
© = RÝ, F : M —> M, Fo(x) := F(x) + 8 for (0,x) € © x M, and m(d0) = 
h(8) d0, where h € L'(d@). Assume in addition that F is C!, which implies that 
(0, x) > Fo(x) is C! as well. Finally, suppose that there is a nonempty open set 
V c O such that 


inf h(0 : 
ime 


For every x* € M and 0* € 6, 
Do1,x*(0)lo2o* = lxxi. 


where 1,4 is the identity matrix of dimensions (k x k). Since 1,4 has rank k, every 
pair (x*, 0*) € M x V satisfies the conditions of Proposition 6.6. Hence, every point 
x* € M is a Doeblin point with respect to the Markov kernel P(x, G) — m((0 € 
©: Fo(x) e G}). 


Exercise 6.11 (Degenerate Additive Noise) Let m be a probability measure on 
(R, B(IR)) that is absolutely continuous with respect to Lebesgue measure on IR, 
with probability density function h. Assume further that there is a nonempty open 
interval / C IR such that 


inf A(0) > 0. 
del 


Show the following statements. 


(i) Let F : R? R2, F = (Fi, F5)! bea C!-mapping and let (x*, 0r) ceR?xI 
such that 


Ox, FoF (x*) + Ofe1) z 0, 
where ei = (1, 0)! . Set 
Fo(x):2 F(x)t-0e), (x,0) €R? x R. 


Then x* is a weak Doeblin point for the Markov kernel associated with (F, m). 
(ii) Let k > 2 and let F : Rk — R be defined by 


F(x,..., Xk) :2 (Xk, X1, X2, Xi 1) - 
Set 
Fo(x) := F(x)+6e1, (x,0) € R xR, 
where e; = (1,0,..., 0) € R*. Then any point x* € RR. is a weak Doeblin 


point for the Markov kernel associated with (F, m). 
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6.3 Random Switching Between Vector Fields 


Let E :— (1,..., N} be a finite set called a set of environments, and for each 
i € E, let G; be a C™-vector field defined on IR^. The choice of R^ is made here 
for simplicity, but we could also assume that the G;'s are defined on a smooth k- 
dimensional Riemannian manifold. 

By the Cauchy-Lipschitz theorem, for every x € IR* the initial-value problem 


X(t) = Gi (x(t)), 
x(0) 2 x 
has a unique (local) solution £ >  ;(ft, x). We assume here that every G; is 
complete, meaning that t — €; (t, x) is defined for all t € R. A classical sufficient 
condition for completeness is that 
IG; Gl <allx +b, Vx € R*, 
for some a, b > 0. The function 6; : R x R^ > RF is called a flow function. 

Let now M C RÝ be a nonempty open set positively invariant under each 6;, 
meaning that ®; (t, M) C M forallt > 0. Consider the non-autonomous differential 
equation 

Y, = G, Y), (6.1) 
where t +> I; € E is a right-continuous control, i.e., 
I, = ik for %_1 < t < Tk, k>1, 


0 = T0 < T1 <... < tk € tk, 


for some sequence (Tk)ķ>0 with liMmg—oo Tk = oo. Throughout this section, we shall 
assume that the sequence 


0; = (11, 41), 82 = (12— t1, i2),..., Of = (tk — tk ik), --- 


forms a sequence of independent identically distributed random variables on © := 
R+ x E having distribution m, where 


t 
m([0, t] x {i}) = n] pi (s)ds, (6.2) 
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pi > 0 for every i, and the densities p; are such that 
inf pj(s)- 0 
O<s<R M ( ) 


for some R > 0. We also always assume that the initial value Yo is a random variable 
independent of the sequence (04)1—.1. 

In words, the process Y = (Y;);50, with initial value Yo and solving the 
differential equation in (6.1), can be described as follows: Pick an initial pair (71, i1) 
at random according to m and follow the trajectory starting at Yọ and induced by 
Gi, for the time tı. Then pick a new pair (A2, i2) according to m, independent of 
(11, i1), and follow the trajectory starting at Y}, and induced by G;, for the time 
A2 = T — T1. Repeating this process defines (Y;);>0. 

The key point here is that, letting Xn = Yz,, (Xn) is a Markov chain induced by 
the random dynamical system (F, m), where for every 0 = (t, i) € ©, 


Fo:M-—M,x-ce Git, x). 
Its kernel P is then given as 
oo 
Pf)- Yon] FODA (63) 
icE a 
The following exercises give concrete examples of such a Markov chain. 


Exercise 6.12 Suppose E = {1, 2,3}, M = R, G1 (x) = a1, Go(x) = —oo, 

G3(x) = —a3x, and p; (t) = A;e "150, i € E, where oj, A; are positive numbers. 
Prove that the Markov kernel P associated with (F, m) admits a unique invariant 

probability measure. Hint: Use Theorem 4.31 on random contractions. 


Exercise 6.13 Suppose E = {1,2}, M = R,Gi(x) = o1, Go(x) = —ao, and 
pi(t) — hie “130, i € E, where oj, Àj are positive numbers. Consider the 
function 
f(. i): Cat, (i) e9, 
and the Borel measure 
a(A):—- m((09€0: f(0)e€ AD, Ae BR). 
Show that the Markov kernel P associated with (F, m) satisfies 


P(x,G)—a((£ ER:x+&EG), xeR,G e B(R. 


Prove that if p101/A1 z p2a2/A2, P does not admit any invariant probability 
measures. Hint: See Example 4.19. 
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6.3.1 The Weak Bracket Condition 


The main result in this section (Theorem 6.16) is a sufficient condition for the 
existence of a weak Doeblin point with respect to the Markov kernel P induced 
by (F, m). This condition will be formulated in terms of the Lie algebra generated 
by (Gi)iez. The Lie bracket of two C l-vector fields G and H ona nonempty open 
subset M of IR is itself a vector field on M , defined as 


[G, H](x) := DH(x)G(x) — DG(X)H(x), xe M. 


Here, DG(x) and DH (x) denote the Jacobian matrices of G and H, respectively, 
evaluated at the point x. The products DH (x)G(x) and DG(x)H (x) are to be 
understood as matrix-vector products. 

If ®g and Py denote the respective flow functions of G and H, one has the 
alternative characterization 


[G, H]G) = L(t, dlo, — (64) 


where 
L(t, x) :2 ®y (vi. OG (-vi. Oy (vr. PG (vr. x)))) 


fort > 0 and x € M (see, for example, Proposition 3.b in Chapter 2 of [41]). Notice 
that for every fixed x € M, L(., x) is defined in a neighborhood of 0 because G and 
H are C!. 


Exercise 6.14 (Properties of Lie Brackets) 


(i) Show that the Lie bracket [-, -] is bilinear and antisymmetric, i.e., for any C D 
vector fields A, B, C and for any à € R, one has 


[4A, C]  [A, C], [A+ B,C]=[A,C]+[B,C], [A, B] = —[B, Al. 


Why is this enough to deduce linearity for the second argument? 

(ii) To a vector field A on M, one can associate the operator on C??(M, R) 
that maps f € C%(M, R) to x > (A(x), V f (x)). Here, (-, -) denotes the 
Euclidean inner product on RÝ and V f denotes the gradient of f. This operator 
is usually identified with A, so one writes Af for the image of f under the 
operator. Let A and B be C?-vector fields on M. Show that 


[A, B] = AB — BA, 


where AB and BA should be interpreted as compositions of the operators A 
and B. 
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(iii) Use the result from (ii) to prove the Jacobi identity: For C 3.vector fields 
A, B, C, one has 


[A, [B, C]] + [B, [C, A]] + [C, [A, B]] = 0. 


We inductively define a sequence of families of vector fields by Go := {Gj}icE 
and Gn+1 := Gn U {[G;, V] : i € E, V € Gy} for n € N. Recall that the linear span 
of a set S contained in some vector space is the set of all (finite) linear combinations 
of elements in S. We say that the weak bracket condition holds at a point x € M if 
the linear span of (V (x) : V € Unen Gy} is equal to the full space R^. As alluded to 
earlier, this condition admits an alternative formulation in terms of the Lie algebra 
generated by (G;);eg. The latter is defined as the smallest linear subspace £ of the 
vector space of C??-vector fields on M that is closed under Lie brackets ([G, H] € £ 
for all G, H € £) and contains (G;)jcg. 


Exercise 6.15 Let £ denote the Lie algebra generated by (Gj);ez. 


(i) Show that G, C £ for all n € N. 

(ii) Deduce from (i) that the weak bracket condition at a point x implies that 
(VG): Ve£) - R*. 

(iii) Show that C, the linear span of Unen Gn, is closed under Lie brackets. Hint: 
This will follow once it is shown that for every n € N, A € Gn, and B € C, 
one has [A, B] € G. The Jacobi identity from Exercise 6.14 may be helpful. 

(iv) Conclude that the weak bracket condition holds at a point x € M if and only if 
(VG): Ve£]) - R*. 


We now state the main result of Sect. 6.3.1. 


Theorem 6.16 Ifthe weak bracket condition holds at a point x* € M, then there is 
n € N such that x* is a Doeblin point with respect to P". In particular, x* is a weak 
Doeblin point with respect to P. 


The proof of Theorem 6.16 relies on a slight generalization of Proposition 6.6. 
To state this generalization, let T be a Borel subset of R? (d € N*) with nonempty 
interior, and let E be a finite set. Let m be a probability measure on © :— T x E, 
equipped with the product o-field of 6(T) and the power set of E. As in Sect. 6.2, 
the n-fold product measure m &. . .&m will be denoted by m". Let M be a nonempty 
open subset of R^, k € N*, with Borel o-field B(M). Let F : Ox M > M be a map 
such that for every i € E, (t, x) œ> Fa, (x) is Cl.Forn e N*,i= (ij,...,in) € 
E". and x € M, let 


gh : T” > M, (ti,..., tn) = (Fi i) 0+ ++ © Fip). 


Proposition 6.17 Let x* € M, n € N*, and t* = (t{,..., t3) € Int(T") such that 
the following conditions hold. 


(i) There isi € E" such that the Jacobian matrix Dg] „+ (t*) has rank k; 
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(ii) There is a neighborhood V C Int(T") of t* such that m" ((- Y V) x {i}) is 
absolutely continuous with respect to "4 (-'V). The corresponding probability 
density function p can be chosen such that 


:= inf p(t) > 0. 
x 


Under these conditions, x* is a Doeblin point with respect to P", and in particular 
a weak Doeblin point with respect to P. 


Exercise 6.18 Prove Proposition 6.17. Hint: The proof of Proposition 6.6 can 
almost be repeated verbatim. 


The setting of randomly switched vector fields introduced at the beginning of 
Sect. 6.3 is clearly covered by the more general setting of Proposition 6.17, with T — 
R+ and d = 1. The proof of Theorem 6.16 therefore reduces to checking conditions 
1 and 2 of Proposition 6.17. While condition 2 follows almost immediately from the 
definition of m, establishing condition 1 requires a link between the weak bracket 
condition and the full-rank condition on the Jacobian matrix of 9i x- This link is 
provided by the following result from geometric control theory, which is implied by 
Theorem 1 of Chapter 3 in [41]. To help the reader understand this result, we give 
its proof. 


Lemma 6.19 Under the assumptions of Theorem 6.16 and for 1 < j < k, the 
following statement holds: For every e > 0 there are i € E/ and t* € (0, &)/ such 
that D xs (t*) has rank j. 


Proof We prove Lemma 6.19 by induction. In the base case j = 1, the weak bracket 
condition at x* implies that there is i € E such that G;(x*) # 0. Then for every 
€ > O there is t* € (0, £) such that G; (6; (r*, x*)) z 0. Since 


$i) = Fe ge) = diü* x*), 
one has 
Doi (t) = Gi(®i(t*, x*)), 


which has rank 1. 

In the induction step, assume that the statement holds for some 1 < j « k, and 
let € > 0. Since the weak bracket condition holds at x*, it also holds in an open 
neighborhood M* C M of x*. There is no loss of generality in assuming that € 
is so small that NE (t) € M* for every i € EÍ and t € (0, &)/. By induction 
hypothesis, there are i € E/ and t* € (0, £)/ such that Do, „(t*) has rank j. 
Since a full rank is preserved under small perturbations of the matrix entries, there 
is an open neighborhood N of t* in (0, ¢)/ such that Dé). .., has rank j on N. The 


mapping 9i s is then a differentiable map between the manifolds N and M, and 
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Dl ja has constant rank j on N. By the constant-rank theorem (see, e.g., Theorem 
2.b of me 2 in [41]), there is an open neighborhood U of t* in N such that 
S:= Q} jax* (U) is an embedded submanifold of M of dimension j. 

We call a vector field V tangent to S if for every y € S, V(y) isa vector in T, S, 
the tangent space with respect to S at the point y. We will now show that there is at 
least one vector field G;, i € E, that is not tangent to S. 

Assume this is not the case, i.e., G; is tangent to S for every i € E. The set of 
vector fields tangent to S is clearly closed under linear combinations. It is also closed 
under the Lie bracket operation because of the flow-based characterization of the Lie 
bracket in (6.4) and the fact that the flow of a vector field tangent to S stays in S 
for t in a nonempty open interval around 0 (see Proposition 1 of Chapter 2 in [41]). 
This shows that every vector field in £, the Lie algebra generated by (Gj)ieg, is 
tangent to S. Fix an arbitrary point y € S. The submanifold S was defined in such a 
way that the weak bracket condition holds at every point in S and in particular at y. 
Since V (y) € T,S for every V € £, the tangent space T, S has dimension k, which 
is strictly larger than j. This contradicts the fact that S has dimension j. 

Let y € S and ij+1 € E such that G;;,, (y) ¢ TyS. There is t € U such that 


y- NOS Then 


Lii ^ : 
Do; it x(t, 0) =D, vus tj) Pia Gin, 9 x (ti, ot Do, eer t))=t,t}41=0 
= (Dei... Gina GeO) = (Doe, 61,0). 


Since t € N , the matrix D. x* (t) has rank j. As a result, the columns of 
Dol. x (t) are j linearly independent elements of T, S. Since the (j + 1)st column 


of Den at 0) is not contained in 7,5, it follows that Denia (t, 0) has rank 
(j+ 1). Again by virtue of the fact that having full rank is preserved under small 


perturbations of the matrix entries, it follows that for t € (0, e) sufficiently small, 


Dp ak t) has rank (j + 1). : 


We are now ready to prove Theorem 6.16. 
Proof (of Theorem 6.16) Let x* € M be a point where the weak bracket condition 
holds. By Lemma 6.19, there are i € E* and t* € V :— (0, R)* such that Dy x* (t*) 


has rank k. 
For Borel sets Aj,..., Ag C (0, R) and A :— A; x... x Aj, we have 


k k 
v (IT^ x 2 - [Inc x im - f oto at 
=l l=1 
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where 


k 
p(t) :— lI Pi pij D) 


l=1 


and thus inftey p(t) > 0. The theorem then follows from Proposition 6.17. oO 


The following proposition is implied by Proposition 6.8 and the definition of a 
right-continuous control at the beginning of Sect. 6.3. 


Proposition 6.20 A point y € M is accessible from x € M for P (given by 6.3) 
if and only if for every neighborhood U of y there exists a right-continuous control 
j: Ry > E such that the solution t > x(t) to the initial-value problem 


X(t) = Gj) 
x(0) =x 


meets U. That is, x(t) € U for some t 7 0. 


In the proof of Theorem 6.16 it was shown that if the weak bracket condition 
holds at a point x* € M, then the assumptions of Proposition 6.17 are satisfied for 
n —k,T = R+,and V = (0, Ry. Furthermore, by our assumptions on (G;)icz, Fo 
is a C!-diffeomorphism (even a C??-diffeomorphism) for every 0 € ©. In analogy 
to Theorem 6.9, one obtains the following corollary. As usual, we let I denote the 
set of points that are accessible from every point in M. 


Corollary 6.21 Ifthe weak bracket condition holds at an accessible point x* € M, 
then T = Int(T) and P has at most one invariant probability measure u. When it 
exists, u is absolutely continuous with respect to AX and supp(u) =r. 


6.4 Piecewise Deterministic Markov Processes 


In this section, we keep the notation of the preceding Sect.6.3 but restrict our 
attention to the specific case where the densities p;,i € E, appearing in the 
definition of the measure m (see (6.2)) are exponential, i.e., 


pi(t) = Xie *" 15.0 


with A; > 0. 

We shall consider certain properties of the joint continuous-time process Z, = 
(Y;, I+). Such a process is sometimes called in the literature a piecewise determinis- 
tic Markov process, in short a PDMP. 
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For all f : M x E — R bounded and measurable and for all t > 0, we let 
P, f(x, i) = E(f (Z)|Zo = (x, i). (6.5) 


Remark 6.22 Alternatively, one can define P; as follows. Given (x,i) € M x E, 
let (Z; ^) = (¥;"", I]) denote the continuous-time process characterized by 


Y = Gu O, Yo" = x, 


where (i ) is defined like (/;) with the exception that 0; has law pj (t)dt ® 6; instead 
of m. Then 


P, f (x, i) = EC (Zi). 


Proposition 6.23 The semigroup { P;}:>0 is weakly Feller and (Z;):>0 is a Markov 
process with semigroup { P;};>0. 


Exercise 6.24 Prove Proposition 6.23. (Hint: Use the memoryless property of the 
exponential distribution: P(t; > t -- s|ri > t) = P(t > s).) Explain why (ii) fails 
to hold if the p;’s are not exponential. 


Exercise 6.25 Let C! (M x E) denote the set of maps f : M x E> R, (x, i) hb 
f(x,i) that are C! in x and have compact support. For f € d (M x E), we let 
V f (x, i) denote the gradient of x — f(x, i).Let D, £ : Cl(M x E) ^ B(M x E) 
be the (unbounded) operators defined by 


Df(x,i) = (V fŒ, i), Gi(x)), 


and 


Lf) 2 DfG, i) i dpi, D) — FG, i). 


jeE 


Prove that for all f € Cl(M x E), 


lim 


t—0 


a fG,i) = Lf (x, i) 


and that the convergence is uniform in x. In the language of continuous-time Markov 
processes, £ is called the infinitesimal generator of the Markov semigroup {P;}. 
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6.4.1 Invariant Measures 


The following result relates invariant probability measures of the discrete kernel P 
(given by (6.3)) to invariant probability measures of { P;}. 

Theorem 6.26 Let Inv( P) (respectively Inv(( P;))) be the set of invariant probabil- 
; 2 1 

ity measures for P (respectively (P;]). Let c = S20 


G) If u € Inv(P), then à € Inv({P;}), where Ñ is defined by 
oo 
pax (ayer [weir A67 a 
0 


in this formula we think of u as a measure on RŽ that only charges M; 
(ii) Ifv € Inv({P;}), then ? € Inv(P), where ? is defined by 


1 
va) = = uva x ti) 


icE 


Gii) The mappings Inv(P) > Inv({ P;}) : u e Â and Inv({ P;}) > Inv(P) : v e 
V, are inverse to each other; 
(iv) supp(Z) = supp() x E. 


Proof 
(i) Let u be a Borel probability measure on M. Then, for all f € B(M x E), 


tae ly f (Zs) ds) 
iu@p(T1) 


Af) = (6.6) 


where u & p stands for the product measure u & p = » ; pi (dx) 6;. Indeed, 


ti TI ‘, 
up (J f(Zs) as) = ) ni f zu f as) A (dx) 
0 M 0 


icE 


oo £ 
E Zaf f f f (9i (s, x), i) ds ie ^ dt u(dx) 
M JO 0 


icE 


=>) pi f f Poe A RAe Acp), 
M 40 


ieE 


where we used integration by parts. This equality applied with f = 1 gives 
Duep() = c—!, so the formula in (6.6) follows. If now u € Inv(P) and Zo 
has distribution u ® p, then Z;, has the same distribution. A continuous-time 
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version of Exercise 4.24 proves that £i lies in Inv(( P;)). More precisely, for 
every t > O and f € B(M x E), Proposition 6.23 (ii) yields 


Tl oo 
Enep (f Pt f (Zs) as) —Eyugp (f Eye@p(f(Zs+1)1s <1 | Fs) as) 


oo 
Enep (f f (Zs0ls en 2 


FFE Ti 
=Enop/ f f(Zs) 2 =Enoo( f f(Zs) as). 
t 


Here the last equality comes from the fact that 


t1 t 
uep (J f (Zs) ds — f f (Zs) 2 =0 
Tt 


because (Z;);>0 and (Z7,41):>0 have the same distribution. In light of (6.6), 
we have thus shown that Â (P, f) = A (f) for every f € B(M x E) andt > 0, 
hence ji € Inv P). 
(ii) Let now v e Inv({P;}). We shall show that » € Inv( P). 
Let K;, K, à, A! : B(M x E) > B(M x E) and Q : B(M x E) ^ B(M) 
be the (bounded) operators respectively defined by 


Ki f(x, i) = f(Oi(t, x) i), 
Kf(x,i)— [eK re. i) dt, 
0 


Af, ) e cf D, AFG, - AP! FG, i), 


and 


OF@= Y^ vif Qo j) 


jeE 
Let D and £ be the unbounded operators on C! (M x E) as defined in 


Exercise 6.25. 
Let f € CL (M x E). Then 


Lf =(D-A Ff +AOf. (6.7) 
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One has 


dK, f 
dt 


where, for the second inequality, we used 
Gi(Pi(t, x)) = Dy Di lt, X)|x=0; (t,x) Gi (x). 


Furthermore, 


4 TSO D unu dt = foi) + Kf) a 
0 


with integration by parts. The relations in (6.8) and (6.9) together with 
oo 
VKf(x,i)— f Aje ^ V K, f (x, i) dt 
0 


justified by f € C1 (M x E) lead to the identity 
K(D-X)f 2 (D-X)Kf 2 -M. 
Thus, with (6.7), 
LKf 2 (D-X)Kf -AQKf 2 —Xf - AQK f. (6.10) 


Let now f € e! (M). We can see f as an element of C (M x E) by setting 
f (x, i) = f(x). For such f the identity in (6.10) reads 


EK f (x, i) 2 Aj(- f) + Pf o). 


Since v € Inv({P;}), vP,(Kf) = v(K f) for all f, and consequently, by 


dominated convergence, v£ K f = 0 (see Exercise 6.25). Hence vf = v Pf 
with 


1 


D ERE EM 
OO = So x UD 


y Aiv(dx x {i}). 


This proves that be Inv(P) (use Remark 4.17 with C = Cc} (M)). Finally, 
observe that the equation v£ f = 0 applied to f(x, i) = f (i) leads to 


v(M x (iD = E Py uM x {k}) = l6 


keE 
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where we used that Nice v(M x {i}) = 1. Hence 


v 
Y 


b(dx) = = uvis x {i} = b(dx). 


(iii) For u € Inv(P) and f € B(M x E), (f) = cu(QtKf). For v e Inv P;}) 
and f € B(M), v(f) = ly(Af). Thus, for f € B(M), 


v l., 
AC) = EOP) = BAOKT) = UPP) = BAT) 


by P-invariance. This proves that ju = p. Conversely, for v € Inv(( P;]) and 
f € Cl(M x E), 


v(f) = vG1/Af) = vAQK(1/Af)) = cb(QK1/Af) = HOEK) = Cf), 


where the second equality follows from v £K f = 0 and identity (6.10). Thus 
y — i. 
(iv) This last assertion immediately follows from the other three. 


oO 


Remark 6.27 By Corollary 6.21 and Theorem 6.26, whenever there exists an 
accessible point for P (see Proposition 6.20) at which the weak bracket condition 
holds, {P;}++o0 has at most one invariant probability. 


6.4.2 The Strong Bracket Condition 


We now define a strengthening of the weak bracket condition. Let Gy :— (G; — 
Gj : ij € E} and G, ,, := G, U {[G;, V]: i € E,V € G} forn e N. We 
say that the strong bracket condition holds at a point x € M if the linear span of 
{V(x) : V € UnenGi,} is equal to the full space R*. Clearly the strong bracket 


condition implies the weak one. 


Exercise 6.28 Let Gy := {[Gi, Gj] : i, j € E} and Gf}; := G} U {[Gi, V]: i € 
E, V € G} forn € N*. 


(i) Show that every vector field in (Uen G;) V Go can be written as a linear 
combination of vector fields in | J, cnx Gy. 

(ii) Let V be a vector field in the linear span of U,,<y G}. Show that there exist real 
numbers (oj;);eg with Xi cg €i = 0 and a vector field W in the linear span of 
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Unen« G5 such that 


V=W+) aiGi. 
ick 


Exercise 6.29 Given a vector field V on M, define the vector fields 09 V and 1@V 
on R x M by 


(009 V)(r,x) 2(0,V(x) and (9V)(r.x) = (1, V(x)). 


Let U| = ([198 G;, 10 G;j]: i, j € E} andU; —-ULU([IGG;,V]l:ieE,Ve 
U;7) for n € N*. Assume that V is contained in the linear span of |]J cj G7, which 


n? 


was introduced in Exercise 6.28. Show that 0@V lies in the linear span of | J, en» U;. 


Theorem 6.30 Zf the strong bracket condition holds at a point x* € M, then, for 
every i € E and t > 0, (x*,i) is a Doeblin point with respect to P;. 


Proof Let x* € M be a point where the strong bracket condition holds, let t > 0, 
andleti € E. Continuity of the vector fields (G j) jeg and their Lie brackets implies 
that x* admits an open neighborhood U such that the strong bracket condition holds 
at every point in U. Let e, € (0,1) be so small that ®; (tọ, x*) € U for every 
to € (0,61). Fix tj € (0,61) and set y* :— ®;(tğ, x*), where now the strong 
bracket condition holds as well. For d € N* and s > 0, set 


Aas = ((ti, ..., ta) € (0,00)? : ti E... fg < s}. 
Fori = (i1,..., i441) € E4*, define the functions 
FÍ5:Aq,x M > M, 
(t, ....12),.3) P» Pig G — G+... H ta), oË P nsu), 
and 
d d, 
Wir: Aas > M, (ti, ..., ta) > FO" ns ..., ta), x). 
The proof of Theorem 6.30 is organized as follows: We first show that there exists 
a sequence of indices i = (i1, ...,ik+1) € EF*! and s* € Ag sa such that 
kt-t : , i ; 
DV; fo (s*) has rank k. Then we show that x* is a Doeblin point with respect 


to Q, the Markov kernel associated with the random dynamical system (F ns q), 


where q is the normalized Lebesgue measure q(-) = Akt [AT (Aj 1,:). From 


this we deduce that (x*, i) is a Doeblin point with respect to P;. 
k, Lap 
Let us begin by showing the existence of i and s* such that Dy, p ‘0 (s*) has rank 


k. On R x M, consider the vector fields (1 6 Gj) jeg defined as in Exercise 6.29. 
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We claim that the strong bracket condition at y* with respect to (G ;) jeg implies 
the weak bracket condition at (r, y*) with respect to (1 @ G j) jeg for every r € R. 

To see this, let r € R, v = (vi, vj) € R x RÝ, and i* € E. Since the strong 
bracket condition holds at y*, there exists a vector field V in the linear span of 
Unen G5 such that 


vk — v1Gis(y*) = VO”). 


By Exercise 6.28 (ii), there exist real numbers (oj) jeg with Vick a; = Oanda 
vector field W in the linear span of | J,.. ; G} (defined in Exercise 6.28) such that 


V=W+ > ajG;. 
jeE 


Then, by Exercise 6.29, 0  W lies in the linear span of |J en» U; (defined in 
Exercise 6.29). Let Uo = (10 Gj; : j € E} andU,4; = Un U {1 9 Gj, V]: j € 
E, V € Un} forn € N. It is easy to check that U7 C U, for every n € N*, so 09 W 
lies in the linear span of LJ, «4; Un. Now we can write 


v = (vi, Vk) =(0, vk — v1Gi* (y) + v (0 6 Gi) (r, y*) 


=06 W)(r y) + 3 oj06Gj)( y) - v0 6 Ge), y”), 
jeE 


where we used that >, jeg 9j = 0. This proves that v lies in the linear span of 
Unen Un. Accordingly, the weak bracket condition with respect to (1 ® Gj) jez 
holds at (r, y*). 

Let (® j)jeE denote the flow functions associated with the vector fields (1 ® 
G j)jeg on M := R x M. Define the maps 


F:0x M 5 M, (s, D, (mx) e Fejl, x) = eG, (x) 
and 


9, o): RE — M, (tyes th) > (Fanin) o. o Fai) x) 


forn e N*,i= (ii, ...,i,) € E", and (r, x) € M. Fix e € (0, £1 A FEL zx. Since the 
weak bracket condition with respect to (1 @ G;) jeg holds at (r, y*), Theorem 6.19 
implies that there are i € E**! and s** = ref a) € (0, &)**! such that 
Dë, q, y») (S7) has rank (k + 1). 
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Sets* = (tf, ..., tf) and c = fj + tf +... + fe. Then s* € Ay ;—, and 
k,t—të i i m k 
Dij. (s*) = Doy (t — t 9a G7) Dok ys GI) Ls 5 j^ 


Since 6j, ,, (t — v, +) is a diffeomorphism, the matrix Dój, ,, (t — T, ed y* (s**)) is 
invertible. We now show that 


i 1 
Dg ui, (S7) s Has ) (6.11) 


oos : k,t—1to ; T" 
is invertible as well and thus that Dw, SR ? (s*) has rank k. To obtain a contradiction, 
suppose that the matrix in (6.11) is not invertible. Denoting the columns of 
Dg, 4 ys (s**) by a1, ..., ak44, the matrix in (6.11) becomes (a1 — ag41, ..., ak — 


ak+1), SO there exist j € {1,..., k} and real numbers (6))/e(1,.._.4}\{j} such that 


gairi 


dj — ak+1 = b» Bi (ai — ak+1)- 


Then 


Since 


»s 1...1 1...1 

H xK = 

Piri ry So) = "m P (a aa)” 
D TT i 


this implies that Dj 41, (.y*) (s**) has rank strictly less than (k + 1), a contradiction. 
Now we show that x* is a Doeblin point with respect to Q, the Markov kernel 


associated with (s q). To do so, we will apply Proposition 6.17 with Ak+1,t, 


(1). 4, E. and yt playing the roles of T, E, m, F, and Pl respectively. 
Since the finite set {1} consists of a single element, we may identify © from 
Proposition 6.17 with T = Ax+1,;. The measure q is clearly absolutely continuous 
with respect to A**! and has a constant probability density function. To be able 


to invoke Proposition 6.17, it is then enough to show that for t* :— (t5, s*), 
DU. (t*) has rank k. But this follows from 


k+1 k,t—tě 
Da, Vap eC) = Dhi ° 6") 
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and the fact that the matrix on the right-hand side has rank k, as established in the 
second step of the proof. 

To complete the proof of Theorem 6.30, we argue as follows. Since x* is a 
Doeblin point with respect to Q, there exist a neighborhood B C M of x* and a 
nonzero Borel measure € on M such that 

Q(x, A) > £(A), Vxe B, Ae B(M). 
Define the event 
C= {Tk41 < t < Tk+2, lo =i, =i forl <1 <k+ I}. 
For every x € B, A € B(M), and j € E, 

P; (Œ, i), A x {J} = P(Z?' € A x UjOP(CO) = 84, DPOF e AIC)P(C). 
Let (To, ..., Tk+1) be independent random variables living on some probability 
space with probability measure P such that 7o has probability density function pj 
and T; has probability density function p; for 1 < | < k + 1. Set 

R = {To + ...+ Tk < t < To +... + Tra). 
Then 
" ; P k+1, 
P(x, i), A x UD = 954 0) PES (o. ..., Tk), x) € A[R) P(C). 


One has 


PUFG S (To. .... Tk), x) € AIR) 


k 

1 kl 

=y |, ^? ] [046014655 (t. .. te), x» 
k+1,t l=1 


oo 
f Ping thoi) dtk+1 dto . . . dtk 
t—(to+...+t) 


zp Aen OG A) > gei Aen EA), 


where 


k k k 
Č := inf Ài J [> exp(—ait — ut — Kings (: — Y«)) > 0. 
1-1 [=l 


to, .... eA 
(to... tk) E Aul 1-0 


This proves that (x*, i) is a Doeblin point for P;. o 
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Corollary 6.31 Jf the strong bracket condition holds at an accessible point x* € 
M, then for all s > 0 


InvCP;) = Inv({ P;}) 


and |nv(P;) has at most cardinality one. 


Proof This follows from Proposition 6.3 and Theorem 6.30. oO 


6.5 Stochastic Differential Equations 


This section, and the related Sect. 7.6.2, are not self-contained and require some 
extra knowledge, namely a certain familiarity with stochastic differential equations. 
Let Go, G},...,Gy denote smooth vector fields on M = Iu (or on a k- 
dimensional manifold). For simplicity, we shall assume here that the vector fields 
G; are bounded with bounded first and second derivatives. 
We consider the Stratonovich stochastic differential equation on M 


N 
dX; = Go(X;) dt + ` Gi(X;) o d Bj, (6.12) 
i=l 
where B = (Bi)>0 = (BŁ, ; veg Bp esci is an N-dimensional {7;}-Brownian 


motion, starting from 0, defined on a probability space (Q2, F, P} equipped with 
a (complete) filtration {F;};>0. 
Equivalently, using the Itó formalism, 


N 
dX, = Go(X;) dt + Y Gi (X) d BÍ, (6.13) 


i=l 
where 


N 
2 1 
Golx) = Gol) + 5 2 | DGi(x)Gi(a). 


i=l 


By classical results (see, e.g., [45, Chapter 8]), given x € M there exists a unique 
solution, X* = (X7);>0, to (6.12) with Xj = x. Furthermore, X* € C(R+, M) and 
the mapping x +> X* is continuous when C (R+, M) is equipped with the topology 
of uniform convergence on compact intervals. 

Let {P;}:>0 be the family of operators on B(M) defined by 


P, f(x) = E (XP). 
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Then it is also classical that (X;) is a continuous Feller Markov process with 
semigroup {P;}+>0 (see, e.g., [45] or [59]). 


6.5.1 Accessibility 


Associated to (6.5) is the deterministic control system 


N 
y= GoQ) + 3 u OGO), (6.14) 
j=l 


where u : [0,00) — RY is a control function which can be chosen piecewise 
continuous or piecewise constant. 
We let t +> y(t, x, u) denote the solution to (6.14) whose initial condition is x. 


Proposition 6.32 Let p, x € M. The following statements are equivalent: 


(i) For every neighborhood U of p, there exists a control u which can be chosen 
piecewise continuous or piecewise constant, and t > O such that y(t, x,u) € 
U; 

(ii) The point p is accessible from x for ( Pr}1>0. 

Proof This follows from the Stroock-Varadhan support theorem [65], which asserts 

that the support of the law of X* equals the closure (in C(R+, M)) of the set 


{y(, x,u) : u piecewise constant}. It is easy to show that the latter also equals 
the closure of (y^, x,u) : u piecewise continuous}. o 


6.5.2 Hörmander Conditions 


The existence of Doeblin points for the 1-resolvent G or for Pr can be deduced from 
certain Hörmander conditions that are similar to the bracket conditions introduced 
in Sects. 6.3.1 and 6.4.2 for PDMPs. 

Using the terminology introduced in these sections, we let L(Go,..., GN) 
denote the Lie algebra generated by (Go, ..., Gy}, and for all x € RŽ 


L(Go, ..., GN)(x) = {V (x) : V € £(Go. ..., GN)}. 


We define similarly L(G1, ..., GN) and £(G1, ..., GN)(x). 
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Given a point x € IR^ we shall say that x satisfies the weak Hórmander con- 
dition (respectively the Hórmander condition, respectively the strong Hórmander 
condition) if: 


(a) [Weak Hórmander condition] £(Go, ..., Gy) (x) = R*; 
(b) [Hórmander condition] The family 


L(G1,..., Gn)(x) U {[X, Y]() : X, Y € £(Go,.... Gn)} 


spans RÝ; 
(c) [Strong Hórmander condition] (G1, ..., Gy)(x) = RE. 


Clearly (c) => (b) => (a). Observe that in (a) all the vector fields, including the 
drift Go, play the same role. In (5) the drift can only appear in a bracket with some 
"Brownian" vector field. In (c) only the Brownian vector fields appear. 

A classical theorem in geometric control theory, originally due to W. L. Chow 
[15], has the following useful consequence: 


Proposition 6.33 Let U C RÝ be a connected open set. Suppose that the strong 
Hórmander condition holds at every point x € U. Then for all x, y € U the point y 
is accessible for ( P;}:>0 from the point x. 


Proof For e > 0 and u : [0, œ) > RN a piecewise continuous function, let 
t — yf(t,x,u) denote the solution to the ordinary differential equation y = 
eGo) Y, u j (f) G j (y) with initial condition y^(0, x, u) = x. Chow’s theorem 
(see, e.g., [15] or [41, Chapter 2, Theorem 3]) asserts that for all x, y € U there 
exist a piecewise constant control u with values in {—1, 0, 1} and £ > O0 such that 
y9(s, x,u) € U forall 0 < s < t and y" (f. x, u) = y. To shorten notation, set 
y*(s) = y*(s, x, u) and y9(s) = y9(s, x, u). Then, forall 0 < s < t, 


S 


ly*G) — »*G)II < ellGollas t + K ri iye — 396) II dr, 


where K — 32m ID Gi |loo. Thus, by Gronwall’s lemma, 


IEO — yO s eF'eliGollos t, 
so that y? (t) — y as € — 0. To conclude observe that y^(s, x, u) = y(es, x, u^) 
with u° (s) = u(=) /£. The result then follows from Proposition 6.32. 


The following results, Theorems 6.34 and 6.37, heavily rely on classical papers 
on hypoelliptic diffusions by Bony [13] and by Ichihara and Kunita [39]. 


Theorem 6.34 The following statements hold: 


(i) Suppose that the weak Hérmander condition holds at p € R*. Then p is a 
Doeblin point for the \-resolvent G; 
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(ii) Suppose in addition that p is accessible. Then ( P;};>0 has at most one invariant 
probability measure u. When it exists, u is absolutely continuous with respect 
to À* (the Lebesgue measure on R*) and supp(u) = T. = Int(T), where T 
stands for the accessible set of { P;}. 


Proof 


(i) Fix O a neighborhood of p, small enough so that £(Go, ..., Gy) (x) spans R* 
for all x € O. 
We say that p is totally degenerate if X IG; (p)|| = 0. We distinguish 
between two cases. 
Case 1: p is not totally degenerate. 
In this case, there exists a connected open set D containing p, relatively 
compact, with D C Ó such that: 


(a) For every x € D, X; |Gi(x)|l z 0; i 
(b) For every x € 0D = D \ D, there exists a vector u normal to D at x such 
that 37? | (G; (x), u)? > 0. 


Here, by a vector normal to D at x, we mean that there exists r > 0 such that 
the open ball with center x 4- ru and radius r ||u|| has empty intersection with 
D. 

The reason for which such a D exists is the following. We can assume, 


without loss of generality, that G1 (p) 4 0 and vu = e, the first vector in 


the canonical basis of R^. For £ > 0 small enough, let 
D = {x € RÝ : |x — pli < £}, 


where |jullı = Ya |u;|. For x € 8D let ux be the vector defined by ux; = 


rea if x; Æ pi and ux į = 1 otherwise. The vector uy is normal to 0D and 
^L 1 


(ei. ux)? = 1. Hence, for e small enough, (G1 (x), ux) > 0 forall x € OD 
and G(x) Æ 0 for all x € D. 

The “formal generator” of the diffusion process (6.12) is the operator L 
acting on C? functions f : R^ — R by the formula 


N 
1 
Lf = Golf) c; 2 Gif), (6.15) 


i=1 


where Gi(f)(x) = (Vf (x), Gi(x)) and G?(f) = G;(G;(f)). Under the 
conditions (a) and (b) above, there exists, by a theorem of Bony [13, Theorem 
6.1], a kernel Gp : Dx D> R+, smooth on D x D \ {(x, x) : x € D), such 
that the following holds: 
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For every f € Cp(D), there exists a unique solution g € Cp(D) to the 
Dirichlet problem 


| Lg — g = — f on D (in the sense of distributions ) 
glap =0, 


and g(x) = Gp f (x) := f G p (x, y) f y) dy. Furthermore, if f is smooth on 
D so is g. 

Note that, by continuity of G p off the diagonal, there exist disjoint open 
sets U,V C D and ô > 0 such that p € U and Gp(x, y) > ô for every 
(x, y) € U x V. 

Let t = inf(t > 0 : X? ¢ DJ. For f € C5(D) smooth on D, Itó's formula 
implies that 


tAT 
(eoo 4 j! e ^5 f(X;) 2 


t>0 


is a local martingale. Being bounded, it is a uniformly integrable martingale. 
Thus, 


EE e™ f (X*) as) =Gpf (x). 
0 


It follows that for every x € U and every Borel set A C R*, 
G(x, A) > Gpla(x) > 8A (An V), 


proving that p is a Doeblin point for G. 
Case 2: p is totally degenerate. 

Let {®o(t, -)) be the flow induced by Go. We first assume that k > 2. We 
claim that it is possible to choose t > 0 small enough to ensure that ®o(t, p) 
lies in O and is not totally degenerate. By what precedes, ®g(t, p) is then a 
Doeblin point for G, and - since it is accessible from p by {P;};>0 -, this makes 
p a Doeblin point for G (the proof of this latter assertion is easy and left to the 
reader). To prove the claim, assume to the contrary that G;(®o(t, p)) = 0 for 
allO <t «eandi = 1,..., N. Then 


d 
0 = DG; (Pole, p), Pott, p) = DG; (Pol, p))Go (Polt, p)) = [Go. Gil(Pol, p). 


Similarly Z(®o(t, p)) = 0 for all Z € £(Go,..., GN) \ {Go}. This is in 
contradiction with the assumption that £(Go, ..., Gy) has rank k > 2 on O. 

Suppose now that k = 1. If for some t > O and i e {1,..., N} 
Gi(®o(t, p)) z 0, the point Po(t, p) is not totally degenerate, and like 
previously p is a Doeblin point. If for all £ > O and i € {1,..., N} 
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Gi(®o(t, p)) = 0, then for all x € {®o(t, p): t > 0] and f > 0, 


WU f(u) 
Gow) 


oo 1 
Gf (x) = Í e™ f (Polt, x))dt > e Í f(Po(t, x))dt = e fl 
0 0 x 


This easily implies that x, hence p, is Doeblin for G. 

(ii) Suppose that p is accessible. Then, by Theorem 6.2, G (and hence { P;};>0) 
has at most one invariant probability measure jz. The minorization G(x, A) > 
8A  (AnV) forall x € U shows that V C T. Thus, I has nonempty interior and 
consequently (see Proposition 6.2) supp(u) = T. Also, for every piecewise 
constant control u, the map x + y(t,x,u) is a diffeomorphism. The set 
Ur uy(t, V, u), with the union taken over all t > 0 and u piecewise constant, is 
then an open set dense in I. It remains to prove that u & A^. Let C(R4, RY) 
be the Wiener space equipped with its Borel o-field and the Wiener measure 
W (dw) (i.e., the law of B = (BT. NS BN );>0) and let O = R+ x C (R+, RY) 
be equipped with the product measure m(dtdw) = e~'dtW(dw). Then, for all 
f € B(M), 


aro) | ro) m(d0), 


where Fa w(x) = X; (w). Now, for almost all w and all £ > 0, the map 
x e Xj? (w) = Fo,w)(x) is a diffeomorphism (see, e.g., [40, Chapter V] or 
Kunita [44]). We are then in the situation already considered in Theorem 6.9 
and the proof of Theorem 6.9 applies verbatim. 


oO 


Remark 6.35 Suppose that T # Ø and that all the points in F satisfy the weak 

Hórmander condition. Then the density of u (when yu exists) is C??. Indeed, let U 

be a neighborhood of F such that all the points in U satisfy the weak Hórmander 

condition. By Hórmander's theorem [38], L and L* are hypoelliptic operators in U, 

meaning that for every distribution f on U, Lf € C??(U) > f e C*(U). If 
= Sh , L* f = 0 so that f is smooth. 


Remark 6.36 Suppose that all the points in M satisfy the strong Hórmander 
condition (and, in case M is a manifold, M is connected). Then T = M and the 
density of u (when pu exists) is positive everywhere. The first statement follows from 
Proposition 6.33 and the second from Bony's maximum principle [13, Corollaire 
3.1] applied to L*. 
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Theorem 6.37 Let p € R^. Suppose that the Hérmander condition holds at p. 
Then p is a Doeblin point for some P, with t > 0. If furthermore p is accessible, 
then for all s > 0 


InvCP;) = Inv({ P;}) 


and |nv(P;) has at most cardinality one. 


Proof Let D be a neighborhood of p at which the Hórmander condition holds. 
Then the law of (X;) killed at D (see Ichihara and Kunita [39]) has a density 
qi (x, y) which is C? int > 0,x, y € D. Thus, qi(p, q) > 0 for some t > 0 
and q € D. This makes p a Doeblin point for P;. The second statement follows 
from Proposition 6.3. o 


Notes 


The material on random switching between vector fields in Sect. 6.3 is based on 
[8] and [5]. More background on the weak and strong bracket conditions with an 
emphasis on how they relate to controllability is provided in [66]. Proposition 6.3 
and the material in Sect. 6.5 are based on [6] and [10]. The first proof that, under a 
weak Hórmander condition at an accessible point, an SDE has at most one invariant 
probability measure goes back to Arnold and Kliemann [2]. The proof given here 
(of Theorem 6.34) is based on the notes [6] and differs from the proof in [2]. 


Chapter 7 ff) 
Harris and Positive Recurrence Check for 


Positive recurrent chains are uniquely ergodic chains for which the Birkhoff ergodic 
theorem (i.e., the strong law of large numbers) holds true for every initial condition. 
Harris recurrent chains are chains which satisfy a weaker form of recurrence 
(defined below). It turns out that Harris recurrent chains possessing an invariant 
probability measure are positive recurrent. Several criteria ensuring Harris and 
positive recurrence are given in this chapter. These are applied in the final section to 
Feller chains, piecewise deterministic Markov processes, and stochastic differential 
equations. 


7.1 Stability and Positive Recurrence 


Let (Xn) denote a Markov chain (defined on (Q, F, F, P)) on M with kernel P. 
Recall that we let 


1 n 
yg = 12. 


i=1 


denote its empirical occupation measure. 
If there exists 7 € P(M) such that, for all x € M and every bounded continuous 
(respectively measurable) function f : M — R, 


Py(lim wf 2f) 21, 


the kernel P (or the chain (X,,)) is called stable, respectively positive recurrent. 

If P is stable, then it is clearly uniquely ergodic with invariant probability 
measure x, where z is the probability measure appearing in the definition. 

The following partial converse follows from Theorem 4.20. 
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Proposition 7.1 Suppose that P is Feller, uniquely ergodic, and that for all x € M, 
{vn} is P,.-almost surely tight. Then P is stable. 


Remark 7.2. ^ stable Feller Markov chain is not necessarily positive recurrent. For 
instance, let X, € [—2, 2] be recursively defined as 


1 
Xn+1 = z%n T En+1> 


where (&,) are independent uniformly distributed random variables taking values 
in (—1, 1}. Then (X;) is Feller and uniquely ergodic (see, e.g., Exercise 3.9 or 
Theorem 4.31), hence stable. It is not hard to prove that zr, its stationary distribution, 
is the uniform distribution over [—2, 2]. On the other hand, for Xo) = 0, X, € D = 
(Yo.o92 7*0 : Ok € (71, 1), m € N} so that v, (D) = 1 while (D) = 0. 
Another example (borrowed from [22]) is the following. Let P be the kernel on 
[0, co) defined by P(0,0) = 1 and, for x > 0, P(x,0) = 1 — P(x, x/2) = 2™. 
This kernel is óo-irreducible, Feller, and admits o as the unique invariant probability 
measure. It is stable (since X, < žo) but is not positive recurrent because the 


probability that X„ never touches 0 is positive. 


Proposition 7.3 Suppose that P is strong Feller and stable. Then P is positive 
recurrent. 


Proof If P is strong Feller, then, for every bounded measurable f, P f is continuous 
so that v, (Pf) — x (Pf) — 0, Px-almost surely. By invariance of x, z (Pf) = xf 
and, as shown in the proof of Theorem 4.20, v, (Pf) — v, f — 0, P,-almost surely. 

oO 


Remark 7.4 A Feller (even strong Feller) uniquely ergodic kernel on a noncompact 
space is not necessarily stable. For instance, let P be the kernel on N defined as 
P(0, 0) = 1 and, forn > 1, P(n,n — 1) = 1 — p, P (n,n + 1) = p with 1 > p> 
1/2. Then ôo is the unique invariant probability measure of this Markov chain but 
the chain is not stable since P,(X, — oo) > 0 for all x > 0. Another (similar) 
example on IR? is given by the deterministic linear dynamical system X441 = aX, 
with a > 1. 


Exercise 7.5 Let (X,) be the deterministic system on $! = R/Z defined by 
Xn+1 = (Xn +a) mod 1, where a € R \ Q. Show that (X5) is stable but not 
positive recurrent. 
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7.2 Harris Recurrence 


The chain (X,,) is called Harris recurrent if there exists a nonzero measure € such 
that for every Borel set A C M and every x € M, 


&(A) > 0 =P, (imsup 140%.) = i) = 1. 
n—oo 


Note that a Harris recurrent chain is €-irreducible. The converse is false as shown 
by the following example. 


Example 7.6 Let P be the Markov transition matrix on M = N defined by P(i, i + 
1) = pi and P(i, 0) = 1 — pi, where po = 0, pi > O fori > 1, and Iis: pi > 0. 
Then the associated chain is óo-irreducible but not Harris recurrent. 


Recall that a harmonic function is a measurable function h : M — R such that 
Ph=h. 


Theorem 7.7 Suppose that (X,) is Harris recurrent. Then every bounded harmonic 
function is constant. 


Proof Let h be bounded and harmonic. Let (X7) denote the chain having P as 
Markov kernel and initial condition Xy = x. Then Y, = h(X;,) is a bounded 
(in particular uniformly integrable) martingale. Hence, by Doob's convergence 
theorem (Theorem A.7 in the appendix), lim, Y, = Yoo exists almost surely 
and E(Y,5|.7,;) = Y,. Givena € R, let {h > a} (respectively {h < a}, {h = a}) be 
the set of u € M such that h(u) > a (respectively <, =). If &((h > a}) > 0, then 
(X7) enters {h > a} infinitely often. Thus Ya; > a so that Y, = E(Yoo|Fn) > a. 
In particular, h(x) = Yo > a. Similarly if £((h < a}) > 0 then h(x) < a. Let now 
a be such that {h = a) z Ø. Then £((h z a} = £(Ujex(a — (n+ D! < h < 
a+(n+1)7!}°) = 0. This proves that h — a. o 


Positive recurrence and Harris recurrence are intimately linked as shown by the 
next important theorem. 


Theorem 7.8 The following assertion are equivalent: 


(a) P is Harris recurrent and lnv(P) z Ø; 

(b) P is positive recurrent; 

(c) There exists x € Inv(P) such that for all f € Ll(x) and every initial 
distribution p, 


P,Clim v (f) = (0) = 1. 
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Proof (c) — (b) — (a) is immediate. Conversely, if P is Harris recurrent with an 
invariant probability measure zr, then P is uniquely ergodic. Let f € L'(z),.A = 
(o € MN: limo 175.4 f 06 (0) = f), and g(x) = P (A). By the ergodic 
theorem, g(x) = 1, z-almost surely. We now claim that g is harmonic, which with 
Theorem 7.7 proves the result. To prove the claim we use the invariance of A under 
0 and the Markov property: 


g(x) = Ex (14) = Ex(1y 00) = Ex 


m= 


ix (14 0 OIF) = Ex(g(X1)) = Pg(x). 


oO 


Theorem 7.9 Suppose P is strong Feller and uniquely ergodic with an invariant 
probability measure zx having full support. Then the equivalent conditions of 
Theorem 7.8 hold true. 


Proof Let f € L! (zt) and let g be defined as in the proof of Theorem 7.8. We have 
seen that g is harmonic. Since P is strong Feller, g is continuous and, by the ergodic 
theorem, g(x) = 1 for z-almost all x. The set (x € M : g(x) = 1} is then a closed 
set containing the support of zr. Since zr has full support, g = 1 and P is positive 
recurrent. m 


Corollary 7.10 Suppose P is strong Feller with an invariant probability measure zt 
having full support. If M is connected, then the equivalent conditions of Theorem 7.8 
hold true. 


Proof This follows from Theorem 7.9 and Proposition 5.18. o 


7.2.1 Petite Sets and Harris Recurrence 


A convenient and practical way to ensure that a chain is Harris recurrent is to exhibit 
a recurrent petite set. 

Given a Borel set C C M we say that x € M leads almost surely to C if Py (tc < 
oo) = 1, where 


tc = min(fk > 1: Xy e CJ. 


We say that C is recurrent if every x € M leads almost surely to C. 
For further reference, we define the successive return times in C recursively by 


uo = min{k > d : Xp ec) 


: (0 _ 
with To = 0. 
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Proposition 7.11 Let C C M be a recurrent petite set. Then (X4) is Harris 
recurrent. 


Proof It easily follows from the definition of a petite set (see Sect. 6.1) that for all 
x € C and A Borel, P,(t4 < oo) > £(A). Thus, using the strong Markov property, 
for all x € M, 


Px(ta < 00) > Prk > tc : Xy € A) = Ex (Px, (ta < oo)) > &(A). 


Therefore, by the Markov property, for all n € N 
P(t4 < 00|Fn) = Px, (ta < 00) = &(A). 


The first term of this inequality converges to 1,,<o0 (see Theorem A.7 in the 
appendix). Thus P,(t4 < oo) = 1 for all x whenever £(A) > 0. By the strong 
Markov property, this implies that X„ € A infinitely often. o 


7.3 Recurrence Criteria and Lyapunov Functions 


We discuss here simple useful criteria, based on Lyapunov functions, ensuring that a 
set is recurrent. They also provide moment estimates of the return times. Conditions 
(a) and (b) of the next result are folklore (see the notes at the end of the chapter). 
We learned condition (a^) from Philippe Robert (see [60], Proposition 8 in Chapter 
8). 


Proposition 7.12 Let V : M — [1, oo) be a measurable map and C C M a Borel 
set. Assume that for all x € C, PV(x) « oo and that one of the three following 
conditions holds: 


(a PV —V x -1on MAC; 


(a’) Condition (a) and sup,c m IE(V(X1) — V(x)|?) < œ for some p > 1; 
(b) PV—V x —ìV on M \ C for some 1 > X > 0. 


Then for all x € M 


(i) Ex(rc) x PV(x) + 1 under condition (a); 
(ii) Ex (TE) < c(1 + V?(x)) for some constant c > 0 under condition (a^); 
(ii) E, (e^) < E, (e7 lo£(1—9*c) < 1 PV (x) under condition (b). 


In particular, C is a recurrent set. 
Proof Let V, = V(Xnatc) + (n ^ Tc). Then (V;,)n>1 is a supermartingale. Indeed, 


for all n > 1, 


E(Vn+1 idi Val Fn) = E(Vn+1 E Vi |. Fn)lrc>n = (PV (Xn) ini V(Xn))1resn <0. 
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Thus E; (n A tc) € Ex (Vn) € E4(Vi) = PV(x)+ 1. This proves the first assertion. 
V(Xnarc) 


The proof of assertion (iii) is similar: Set V, = UC: Then (Vn)n>1 is a 
supermartingale. Thus 
PV (x) 


by (e7 080 re) < Ey (Va) < Ex(Vi) 7. 


We now prove assertion (ii), following Robert ([60], Proposition 8, Chapter 8). 
We claim that for all x > —1 


d x)? x 1+ px - Cp r(x), (7.1) 
where 


p(p— 1) 
4 


r(x) = x?°(1 + Ix? and C = 
for p > 2; and 
r(x) = |x|? and Cp = 1 


for 1 < p < 2. Indeed, by the Taylor-Lagrange formula, for all x > —1, 


—1 

(1-- x)? =1+px+ POP R(x) 
with R(x) = hia — s)(1 + sx)? ds. Thus |R(x)| < 3(1 + Ix"? for p > 2. 
For 1 < p < 2and x > 0,|R(x)| < hia — s)sP-?xP-2? ds = sup 


while for 1 < p < 2 and —1 < x < 0 one has |R(x)| < 1 (because s € [0, 1] — 
(1—s)(1+sx)?~? is decreasing, hence bounded above by 1). This proves the claim. 
Now set 


n 
Zr =1+e (vao F z) , 
where € > 0 and 
1 
Ant = V(Xn41) as V(Xn) sp 2 


Then 


n 


P 
P p EAn+1 
Zayi = Zn ( + | d 
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so that by (7.1) and condition (a), 


“oe 


E 
AE Zi - Zr CEs (r( 


2Z, | 


on the event tc > n. Now, it is easy to check that r (2 SOR) < < = (1+ |An+1))? for 
p > 2, and r( AR) < < gP HL lAs” for 1 < p < 2. Thus, for € > 0 small enough, 
conditions (a) and (a^) make (Zi a) a supermartingale. We can then conclude as 
in the proof of (i). o 


Remark 7.13 If V is a Lyapunov function in the sense that PV < oV + x with 
0 < p < landx > O, then the assumptions of Proposition 7.12 (b) hold true with 
0«A«1—pandC —[x € M :V(x) X q EH z}. Compare to Proposition 4.23. 


The next proposition extends assertion "e of Proposition 7.12 and gives an 
alternative condition (to conditions (a), (a’)) to control the moments of tc. The 
proof is based on a beautiful argument used in section 4.1 of Hairer's notes [31]. 


Proposition 7.14 Let V : M — [1, œ) be a measurable map and C C M a Borel 
set. Let o : [0, o0) œ> R} be a concave C!-function and let h : [1, 00) — [0, co) 


be the map defined by 
x 
A(x) = f a 
| 9) 


Assume that for all x € C, PV (x) < œ and that for all x € M \ C 


PV (x) — V(x) < —e(V (x). 


Then, for allx € M NC, 


WC (tc)) € V(x) 


and, for all x € C, 


(Rh (zc) < h (h(PV Q9) + 1). 


Proof First observe that o' > 0 (for otherwise by concavity 9 could not be > 0). 
For x > l and t > O set H(t, x) = h^ (h(x) + t). It is readily seen that 


oH oH 
ur (t,x) = q(H (t, x)) = g(x) (t, x). (7.2) 
t Ox 
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Thus 


0H (9 CH (t, x)) 7 e'GO)e t (x) 0 


qur en ec? = 


In particular, H is convex in t and concave in x. 
It follows that for all n > 0 


H(n+ 1, V(Xn41)) — H(n, V(Xn)) = 
H(n 1, V(Xn41)) — H(n +1, V(X4)) + Hin + 1, V(X4)) — H(n, V(X4)) 


oH oH 
= aa + 1, V(Xn))V (Xn+1) — V(Xn)) + ku +1, V(Xn)). 


Therefore, on the event {Xn Z C}, 


(H (n + 1, V(Xn41)) — H (n, V (Xn) | Fn) 


aH oH 
< mod cmi + 1, V(Xn)) + we + 1,V(Xn)) < 0. 


Here the first inequality follows from the hypotheses on V and the second one from 
Eq. (7.2). This makes the process (H (n^tc, V(Xnatc))n>1 a supermartingale. Thus 


Ex (h^ (n A te)) < Ex(H(n ^ tc, V(Xnatc))) < EG, V(X1)) < HQ, PV), 


where the last inequality follows from concavity of H in x and Jensen's inequality. 
In the case x € M X C, by monotonicity and concavity of h, 


A(PV(x)) < h(V Œ) — G(V(x))) € AVX) — WV G)e(V E) = hO Q9) 1. 


Thus H (1, PV (x)) < V (x). This proves the result. o 


7.4 Subsets of Recurrent Sets 


Let C C M be a recurrent set for the chain (X,,) (for instance the sublevel set (V < 
R} of a Lyapunov function) and U C C a measurable smaller subset (for instance 
the neighborhood of a Doeblin point). It is often desirable to deduce recurrence 
properties of U from recurrence properties of C. This short section discusses two 
such results. 
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The induced chain on C is the process (Y,)n>1 defined as 


Y, = Xm). 
C 


Exercise 7.15 Verify that (Y„)n>1 is a Markov chain on C. 


Proposition 7.16 Let C C M be a nonempty recurrent set and U C C a 
measurable subset. Suppose that there exist k > 1 and 0 < & < 1 such that for 
all x € C 


P.«(di e {1,...,k}: Yi eU) 2e, 


where (Y,) is the induced chain on C. Then 


(i) U is recurrent; 
(ii) Zfsup,ec ix (TE) < oo for some p > 1, then 


sup Ge) < ©; 
xeC 


(iii) /fsup,cc Ex (e407) < oo for some Ag > 0, then 


sup E, (e+) < oo 
xeC 


for some 0 < X < A9. 


Proof For all x € M,TP,-almost surely, 
Liy<oo = lim P,(ry < |F m) = lim Py, (ty < oo) > e. 
noo Te noo 


Here the first equality follows from the martingale convergence theorem A.7 and 
the second from the strong Markov property. This proves that U is recurrent. 

Let oy = min(n > 1 : Y, € U}. The proofs of assertions (ii) and (iii) now 
follow from the identity ty = qe. exactly as in the proof of Proposition 2.18 
(i), (ii). The verification is an easy exercise left to the reader. o 


When P is Feller, the existence of a compact recurrent set C makes every 
accessible open set U recurrent. More precisely, 


Proposition 7.17 Suppose that P is Feller. Let C C M be a nonempty compact set, 
x* e M an accessible point from C (i.e., x* € Tc) and U a neighborhood of x*. 


(i) If C is recurrent, so is U; 
(ii) If U C C and sup,cc CIA < oo for some p > 1, then 


sup GL) < 00; 
xeC 
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(ii) FU C C and sup,cc D. (e^07C) < oo for some Ao > 0, then 


sup E, (e^) < oo 
xeC 


for some 0 < X < A9. 


Proof For £ > 0 and i € N* let O(e,i) = (x € M : P'(x,U) > e). By Feller 
continuity and the Portmanteau theorem 4.1, O (e, i) is an open set. By accessibility 
of x*, the family (Oe, i) : € > 0,i € N*} covers C. Thus, by compactness, there 
exist € > 0 and a finite set / C N such that C C Ujer O (e, i). This shows that, for 
all x € C, 


Py (ty < k) > e (7.4) 


with k the largest element of I. Assertions (ii) and (iii) then follow from 
Proposition 7.16 because, for all x € C, 


P.Gi € (1,..., k}: Y € U) > P, (ty € k) > e. 


The proof of the first assertion is similar to the proof of the first assertion in 
Proposition 7.16. Namely, for all x € M, P,.-almost surely, 


Liy<oo = lim Py(ty < |F m) = lim Py, (tu < oo) > e. 
n—oo Tc noo 


Thus Py (ty < œ) = 1. o 


7.5 Petite Sets and Positive Recurrence 


We have seen (Proposition 7.11) that the existence of a recurrent petite set for a 
Markov chain implies Harris recurrence of the chain. If, in addition, the return times 
to the set are bounded in L!, then the chain is positive recurrent. 


Theorem 7.18 Let C C M bea recurrent petite set such that 


sup E, (tc) < oo. 
xec 


Then the equivalent conditions of Theorem 7.8 hold true. 


Before proving this theorem, we start with a proposition relating the recurrence 
properties of the chain (X,,) and the sampled chain Y, := X7,, where 


1 t= due Aa 
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for n > 1, To := 0, and (A;)j>1 is a sequence of i.i.d. random variables taking on 
values in N. 
Recall that in the particular case where A; has a geometric distribution with 
parameter a (i.e., P(A; = n) = a"(1 — a) for all n € N), then (Y) has kernel Ra. 
The hazard rate of ^; is the sequence 


A(n) = P(A; = n|Ai > n) = EL ne N. 


For a geometric distribution with parameter a, the hazard rate is constant and equals 
l-a. 


Exercise 7.19 Suppose A; has a negative binomial distribution with parameters 
(a, m) (see Exercise 5.2 (ii)). Prove that A(n) is nondecreasing and converges to 
1 — a. In particular, 


inf A(n) = A(0) = (1 — a)". 
neN 


The next result is an easy consequence of the memoryless property when A; 
has a geometric distribution (prove it as an exercise) and this is exactly what we'll 
need for the proof of Theorem 7.18. It is however interesting to point out that it 
remains valid under the weaker assumption that the hazard rate of A; is bounded 
below. Tom Mountford helped us with the proof of this proposition and suggested 
the minorization condition on the hazard rate. 


Proposition 7.20 Let (Ay), (Tn) be as above, i.e., (An) is an i.i.d. sequence of N- 
valued random variables and T, := A, +...+ An. Assume that there isa € (0, 1) 
such that 


inf A(n) > 1—a > 0. 
neN 


Let N = {nj <n <... « ny <...} C N be an infinite set of integers and 
ty := min(n > 10: T, e M}. 


Then 


(i) Pity < œ) = l; 
(ii) P(Tey > ni) < o! for alli > 1; 
(ii) E(ADE(G) < n + sinis — no; 
(iv) If A(n) = 1 — a for all n € N (meaning that ^; has a geometric distribution 
with parameter a), inequalities (ii) and (iii) are equalities. 
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Proof 


(i) For n > 1, let Fa := o(A1,..., An) and v(n) := PAi > 0: T; = n). We 
claim that v(n) > 1 — o for all n > 1. One has 


v(n) =E(PGi > 0: T; = n|Ji)) 
—v(n)P(A, = 0) + E(v(n — Ap) Iocan) + P(A1 =n). 


Thus, v(1) = A(1) > 1 — a. Suppose now that v(i) > 1 — a fori = 
l,...,n — 1. Then 


v(n)P(A, > 0) > (1—a)P(0 < Ay < n4 P(A, =n) > (1—a)P(A; > 0). 


This proves the claim by induction. It follows from what precedes that 
P(ty < oo|.F4) > 1 — (1 — o", so that P-almost surely 


liy<œ = lim P(t < col|F,) = 1. 
Gi) Fork > 1, let Sk := min(i > 0: nk < T; < nj41] € NU foo]. Then 
POL ng) = PON > nki; Sk < 00) + POT nga; Sk oo). 
Using the strong Markov property, 


P(Try > nk+1; Sk < 00) = E(P (Tiy > ngalFs 0s <00}) 


= EC — vni — Ts) y >n) Usp <oo}) < aP(Try > ng; Sk < 00). 
On the other hand, 


PT > Nk+1; Sk oo) 


zx PTS s TIERE cmm Ø; T; < ng; Tipi > ng), 


i>0 
and 


P({To, D, ..., Ti} O {n1, --- nk] = Ø; T; < ni Tizi > nkl Fi) 
= N, Ti, T} Ainon} 17 <n, P (Ai > nky — Til Fi) 


< AUT ,Ti,. T3 Otn, n} LT; <n, P (Aip = nki — Til Fi) 
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by the assumption on the hazard rate of (A;). Therefore, 


PT. > e413 Sk = 00) 


i=0 


=aP(T,,, > ng; Sk = oo). 
Finally we have shown that 
P > nk+1) Š aP (Tiy > ny). 


(iii) Let M, := T, — E(T,) = T, — nm, where m := E(A;). Then (M,) is an 
(F;,)-martingale with zero mean. Thus, by part (ii) of Theorem A.4, 


E(Ma iy) = 0 = E(TiyAn) — mE(tw ^n), 


and, by monotone convergence, 


mE(zy) = E(Try) = X mP (Tiy = me) = Y Oi — ng POS, > np) 
k>1 kz0 


with the convention ng :— 0. 
(iv) This follows immediately from the proofs of (ii) and (iii). 


oO 


Proof (Theorem 7.18) In view of Theorem 7.8 and Proposition 7.11 it suffices to 
show that there exists an invariant probability measure for (X;). 

First observe that we can always assume that £(C) > 0, where £ is the minorizing 
measure of R4. Indeed, let £;(-) = a* f E(dy) Pk (y, -). Then for all x € C 


Ra(x, ) > a¥ Ra P* (x, ) > &() 


so that & is another minorizing measure. Now, there exists k such that &k(C) > 
0, for otherwise we would have P^(y, C) = 0 for all k and £-almost all y, in 
contradiction with the assumption that C is recurrent. Replacing & by such a & 
proves our claim. 

Let tc < «e < ie < ... be the successive times at which (X,,) enters C, 


i.e., uy = min{n > oe : Xn € C}. By assumption (iii) (of the theorem to be 


proved) and the strong Markov property, 


GO) < kM 
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for all x € C. Let (Y;) be the chain with kernel Ra, te = min{n > 1 : Y, € C}, 
and Q(x, -) the kernel on C defined by Q(x, A) = Px Yzy € A) for all Borel 


sets A C C. By Proposition 7.20 (i), 2 < œ a.s. so that Q is a Markov kernel 
(i.e, Q(x, C) = 1). Furthermore Q(x, A) > Ra(x, A) > ew(A) with e = E(C) 
and w(A) = ED In other words, Q is a Markov kernel whose full state space 
(here C) is a small set. Then, by a theorem that will be proved later (Theorem 8.7 
in Chap. 8), Q has a (unique) invariant probability measure zr. If Yo is distributed 
according to zr so is Y», and by Proposition 7.20 (iii) 


M 
7 Y 

ur (Tc) € —. 
a(t) - 


By Exercise 4.24 this implies that (Y) (or equivalently R4), hence (X,,), admits an 
invariant probability measure. o 


7.6 Positive Recurrence for Feller Chains 


The next results give some (much more tractable) conditions ensuring that a Feller 
chain is positive recurrent. 


Theorem 7.21 Let P be Feller. Assume that there exist a compact recurrent set C 
such that sup ec Ex(tc) < oo and an accessible weak Doeblin point x* € Int(C) 
(the interior of C). Then the equivalent conditions of Theorem 7.8 hold true. 


Proof By assumption there exist a neighborhood U C C of x* and a nontrivial 
measure € such that Ra(x,-) > &(-) for all x € U. By Proposition 7.17, U is 
recurrent and sup,cc Ex(ty) < oo. We can then apply Theorem 7.18, with U in 
place of C. This proves the result. o 


Corollary 7.22 Let P be Feller. Assume that there exist an accessible weak Doeblin 
point, a proper map V : M — R4, and a nonnegative constant R such that 
PV < V — l on {V > R} and supyeyy(yj«gj PV(x) < co. Then the equivalent 
conditions of Theorem 7.8 hold true. 


Proof Let x* be the accessible weak Doeblin point. Choose R large enough so that 
V(x*) < R. Set C = (V x R} and apply Proposition 7.12 (a) and Theorem 7.21. 
oO 


Theorem 7.23 Let P be Feller. Assume that there exists an accessible weak Doeblin 
point and that for all x € M the family of empirical occupation measures 
(v4) is P,.-almost surely tight (this is true for instance under the assumptions of 
Corollary 7.22). Then the equivalent conditions of Theorem 7.8 hold true. 
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Proof By assumption there exists an open accessible petite set C. By Theorem 6.2 
and Theorem 4.20, there exists a unique invariant probability measure zr for P and 
Vy => 1, P,-almost surely, for all x € M. Since C is open and accessible, zt (C) > 0 
(see Proposition 5.8 (ii)) and, by the Portmanteau theorem, lim inf v, (C) > z(C). 
This proves that every point x leads almost surely to C. The result then follows from 
Proposition 7.11 and Theorem 7.8. o 


7.6.1 Application to PDMPs 


Let E = {1,..., N} be a set of environments and {G;}icg a family of smooth 
globally integrable vector fields on R. 

Consider the PDMP Z, = (Y;, I+) € RÝ x E as defined in Sect. 6.4. Recall that, 
starting from Zo = (Yo, Jo) = (x,i) € Rk x E, Y, follows the flow induced 
by the vector field G; during a time tı having an exponential distribution with 
parameter A; and J; = i on [0, t1). Then a new environment j € E is chosen 
with probability p; > 0, Y; follows the flow induced by G ; during a time v? — tj 
having an exponential distribution with parameter à j, and J; = j on [ri, v2), etc. 

If now the initial environment /o is randomly chosen with law paeem E Piôi, then 
Xn = Y,, defines a Markov chain on RÝ, as explained in Sect. 6.3, whose kernel is 
given as (see formula (6.3)) 


PfG) =o pi f * FDE, Daie dr. (1.5) 
0 


ieE 


The following exercise shows that if there exists a common Lyapunov function for 
some (not necessarily all) of the vector fields G; and if this function does not grow 
too fast along the flows of the other vector fields, then it can serve as a Lyapunov 
function for P. 


Exercise 7.24 (Lyapunov Function for PDMPs) 


(i) Suppose that there exists a proper C!-map V : R^ — Ry, and numbers 
01, ..., Œy (not necessarily negative) such that for each i € E, 


V i 
lingo c e OU uu e 
lixl oo V(x) 


Show that, if 


Pan o , 
p MTU 
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then 
PV<pV+k 
for some 0 < p < l and x > 0. 


(ii) Let oj;(x) be the largest eigenvalue of the symmetric matrix DG;(x) + 
DGi(x)! and let a; = sup, o; (x). Show that 


i (x, Gia) _,. (x, Gi(x) - Gi(0) _ aij 
inisup-———— = lim sup ———————.———— s — 

ixl >œ Il lix] oc II || 2 
Using (i), give conditions on a},...,a@y ensuring that V(x) = ||x||* is a 


Lyapunov function for P. 
(iii) Suppose that there exists a proper Cl-map W : R^ — Ry, and numbers 
a\,..., ay (not necessarily negative) such that for each i € E, 


lim sup (VW(x), Gi(x)) < ai. 


lix li oo 


Show that, if 


qj 
2 « 0, 


icE 


then, for € > 0 sufficiently small, the map V = e® and the numbers a; = £a; 
satisfy the conditions given in (i). 


Theorem 7.25 Suppose that there exists V as in Exercise 7.24 (i) and an accessible 
point (in the sense of Proposition 6.20) at which the weak bracket condition as 
defined in Sect. 6.3.1 holds. Then the chain (Xn) is positive recurrent. Furthermore, 
the process (Zi) is also positive recurrent in the sense that, for all f : Mx E—R 
measurable and bounded, 


1 t 
lim t f VAT ees 


t—oo f 0 


almost surely, where {i stands for the invariant probability measure of (Z;). 


Proof Positive recurrence of (X;) follows from Corollary 7.22 and Theorem 6.16. 
We now prove the second statement. Let f € B(R* x E) and A — 
(lim; oo ito f(Zs) ds = fi(f)}. In order to show that P,;(A) = 1 for all 
(x, i) € RÝ x E it suffices to show that Dice PiPx,i(A) = 1 for all x. This means 
one needs to show that lim;-+o Lo f(Zx) ds — (Cf) almost surely, where 
(Z7) stands for the PDMP with initial condition (x, Jo) and Jo has distribution 


Vier Piôi. 
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Let Gn = o { (1o, t1), (I, T2 — t1)... s, 4. Tn — Tn-1)}. Then Jo” f(Z}) ds 


is G,-measurable, and 
Tn+1 
E( f FG) ds 
Tn 


2 = f (X*), 


where 
" oo t 
f(x) 23. f f (Sis, x), DA;e ^? ds dt 
ieg. "0 0 
oo 
23. f (Pit, x), e ^" at. 
ieg °9 
Also, 


Tn+1 
var( | f(Z}) ds 


Thus, by the strong law of large numbers for martingales (see Theorem A.8), 


2 
2 € E(tt1 — tm) 19: fI? < max lf’. 


1 n—1 


lim f” f(Z3ds — LAD) =0 
noon 0 < 


almost surely. On the other hand, by the strong law of large numbers, lim; oo E = 
P E pj/dj- Thus 


1 ff 1 A 
lim — Z*) ds = -— — u(Ô, 
1-50 3i pian es ick p 


where u is the invariant probability measure of (X5). This proves the positive 


recurrence of (Z;) and also gives - in this special case - an alternative proof of 
Theorem 6.26 (i). oO 


7.6.2 Application to SDEs 


Using the notation and assumptions of Sect. 6.5, consider the stochastic differential 
equation (6.12). Recall from the proof of Theorem 6.34 that the “formal” generator 
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of (6.12) is the operator L defined on C?-functions f : Rk — R by 


N 
Lf) = Go AW) + 5 GNC. 


i=l 


Lemma 7.26 Suppose there exists a proper C?-function U : Rk — R, and 
positive numbers a, B such that 


LU < —aU +B. 
Then, for all t > 0, 


B 


PU <e“U+-(l-e™). 
a 


Proof Set W, = e" (U(X?) — E). By Itó's formula, 


t 
W;—Wo- [i e"[xU(X;) — B+ LU(Xs)] ds + Mix Mi, 
0 


where (M,;);>0 is a local martingale with Mo = 0. Thus, for all n € N, (M7) = 
(MiAn)r»0 is a continuous local martingale which is bounded below (by =e" — 


(U (x) — £)). A local martingale that is bounded below may not be a martingale but 
is always a supermartingale (see, e.g., [45, Proposition 4.7]). Therefore E(W;45) — 
U(x)+Ê < EUM?) < E(Mo) = 0. Hence, E(U (Xian) < e" "Pu (x) - P) £ 
and the desired result follows by Fatou's lemma. o 


Corollary 7.27 Suppose there exists a proper C?-function as in Lemma 7.26 for 
(6.12) and an accessible point p at which the weak Hórmander condition is satisfied. 
Then { P;),.9 has a unique invariant probability measure u and for every f € B(R^) 
and x € R* 


"Y 

lim -f F(X) ds = uf). 
t—oof 0 
Proof Let G be the 1-resolvent. Then, by Lemma 7.26, GU < nU + 2s By 
Corollary 7.22 and Theorem 6.34, G is a positive recurrent Markov kernel. The 
final statement follows from Proposition 4.58 (ii). oO 


Chapter 8 A 
Harris Ergodic Theorem Chente; 


This chapter generalizes the convergence theorems proved in Chap. 2 for countable 
chains to general chains under the assumption that there exist an aperiodic small set 
(or Doeblin point) and, when M is noncompact, a Lyapunov function or a suitable 
control on the moments of the return time to this set. The final section discusses 
convergence in Wasserstein distance. 


8.1 Total Variation Distance 


Recall that B(M) is the set of real-valued bounded measurable maps on M. Given 
two probability measures œ and f on M, the total variation distance between a and 
B is defined by 


læ — B| = sup{la(f) — BA)! : f € BUM), Ifl < 1). (8.1) 
See also Remark 5.21 in Sect. 5.3. It is easy to verify that the total variation distance 
defines a metric on P (M). 
Note that if K is a Markov kernel on M, 
laK — BK| < |a — £| (8.2) 


because K maps {f € B(M) : || flloo < 1} into itself. 
Proposition 8.1 Let o, B € P(M). 


(i) 
la —B| —2 sup a(A)— B(A). 
AceB(M) 
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(ii) Assume o and p are absolutely continuous with respect to y € P(M) with 
respective densities p and q. Then 


la — pl = f w-auv. 


(iii) The space P (M) equipped with the total variation distance is complete. 


Proof We begin by proving assertion (ii). For all f € B(M) with || flloo < 1, 
la(f) — BOA) € f Ip — aldy so that |æ — | < f Ip — qidy. Conversely, set 
f = 1psq — 1p<q. Thena(f) — (f) = J |p — qldy. 

We now pass to the proof of (i). We can always assume that for some y € P(M), 
a and f are absolutely continuous with respect to y. It suffices for instance to choose 
y= ate | Then 


[oca / (p - q)dy «f (q — p)dy = 2(0(G) — B(G)) 
G M\G 


with G = (p > q}. Also, forall A € B(M), a(A)— B(A) x a(ANG)—B(ANG) < 
a(G)—6(G). Our last task is to prove completeness. Let (un) be a Cauchy sequence 
for the total variation distance. Then, in view of (i), for every Borel set A, (4, (A)) 
is a Cauchy sequence in R, hence converges to some number u (A). By the Cauchy 
property, the convergence is uniform in A, i.e., SUP4eg(m) |n (A) — u(A)| 0. 
From this it is easy to verify that jz is a probability measure on M. o 


Exercise 8.2 For f : M — R, let A(f) = sup( 1827/09 :x,y € M}. Show that 


|a — B| = suptla(f) — B(f)|: f measurable, A(f) < 1}. 


Remark 8.3 Although the total variation distance (8.1) and the Fortet-Mourier 
distance (4.2) look very similar, they induce quite different topologies on (M). 
Clearly, 


pia, B) € le — L| 


so that convergence in total variation implies weak convergence; but the converse is 
false. Let, for example, X be a random variable on R whose law Py is absolutely 
continuous with respect to the Lebesgue measure dx (e.g., a Gaussian random 
variable) and X, — x. Then X, — 0 almost surely, hence Py, = ôo, while 
|Px,, — à9| = 2 by Proposition 8.1 (i). 


Remark 8.4 (Total Variation of Signed Measures) A finite signed measure on M is 
a map u : B(M) — R such that (0) = 0 and which is c -additive, meaning that 


u] An) = $ UAn) 
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for any family {An}, A; € B(M), having disjoint elements. The Hahn-Jordan 
decomposition theorem (see [21], Theorem 5.6.1) asserts that such a measure can 
be written as 


poppe, 


where u*™ and jz~ are nonnegative measures that are mutually singular: There exists 
D € B(M) such that for all A € B(M), wt(A) = u(A n D) and u“ (A) = 
—p (An D^). The total variation measure of u is the nonnegative measure jut + u7 
and its total variation norm is 


|u| = u” (M) + n (M) = sup C] : f € BM), Ilflloo < D. 


When M is a compact metric space, the topological dual C*(M) of C(M) can 
be identified with the space of bounded signed measures equipped with the total 
variation norm, so that convergence in total variation coincides with (strong) 
convergence in C*(M). We refer the reader to [21], Chapter 7, for more details 
and a proof of this latter point. 


Exercise 8.5 Use the Hahn-Jordan decomposition to show assertion (i) of Propo- 
sition 8.1. 


8.1.1 Coupling 


Given a, B. € P(M), a coupling of a and P is a random vector (X, Y) defined 
on some probability space (Q, F, P) taking values in M x M such that X has 
distribution o and Y has distribution p. 


Proposition 8.6 Let o, B € P(M). Then 
(i) (Coupling inequality) For every coupling (X, Y) of (a, B), 


la — B| x 2P(X # Y); 
(ii) (Maximal coupling) There exists a coupling (X, Y) of (a, 8) such that 


læ — B| 2 2P(X # Y). 
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Proof 
(i) Forall A € B(M), 


PXXeA)-P(YeA)- P(XeA;X ZY)-P(YeA; X ZY) E P(X ZY). 


This inequality, combined with Proposition 8.1 (7), proves (i). 
(ii) Assume without loss of generality that da = pdy and dB = qdy for some 
y € P(M) (e.g., y = (a + B)/2). Then, by Proposition 8.1 (ii), 


a - 8 f ip- aity 220 - 5. 


where € = f(p A q)dy. If € = 0, œ and f are mutually singular and any 
coupling satisfies the equality |o — 8| = 2P(X # Y) = 2. Ife Æ O0, let 
UECcM,VceM,W«ecM,9 € {0, 1} be independent random variables having 
distributions 1(p ^ q)dy, 44z(p — (p ^ q))dy, 14 (4 — (p ^ q))dy, and 
(1—56)694-68,, respectively. Set X = OU+(1—©)V and Y = GU --(1—6)W. 
Then P(X 4 Y) = P(© = 0) = (1 — £) and (X, Y) is a coupling of (a, B). 


Oo 


8.2 Harris Convergence Theorems 


Throughout this section P is a Markov kernel on M. Recall from Chap. 6 that a set 
C € B(M) is called a small set for P if there exists a nontrivial measure £ on M 
(called the minorizing measure of C) such that 


P(x,-) 2 &() (8.3) 


for all x € C. Recall also that a point in M is called a Doeblin point if it has a 
neighborhood which is a small set. 


6.2.1 Geometric Convergence 


The importance of small sets is emphasized by the following simple version of 
Harris's theorem (sometimes called Doeblin's theorem). 


Theorem 8.7 Let m € N*. Suppose that M is a small set for P" with minorizing 
measure &. Then, for alla, B € P(M), 


le P^ — BP"| « Q1— "la — Bl, 
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where O < € = &(M) < 1. Furthermore P has a unique invariant probability 
measure x and 


lo P" — z| < (1 — e) "lig — z|. 


Proof First suppose m = 1. Set y = Lm € = (M), and 


ife « 1. 


TEO IO 


mm 
Then K is a Markov kernel and œ P = &y + (1 — e)a K so that 
laP — BP| = (1—5)]aK — BK| < (1 — &e)la — Bl, 


where the last inequality follows from (8.2). Hence, a +> o P is a strict contraction 
for the total variation distance. Then 


la P” — BP"| < (1—e)"|a — £| 


and a +> oP has a unique fixed point, by application of the Banach fixed point 
theorem, because the space of probability measures endowed with the total variation 
distance is complete. 

If now m > 1, set Q = P”. Write n = km +r forr € (0,...,m — 1} and 
jap” — 8P"| = |aP' Q* —8P'Q'| < a - oleP" — 8P'| s 1 — la - Bl. 


To conclude, recall that if 7 is invariant for P", then l rum. z P* is invariant 


for P. a 


Remark 8.8 Theorem 8.7 is purely measure-theoretic and does not require that M 
is a metric space. 


Aperiodic Small Sets 

A measurable set C C M is said to be aperiodic if the set 
R(C)= {k> 1 EE P*(x, C) > 0} 

is nonempty and aperiodic as defined in Sect. 2.4. 


Exercise 8.9 


(a) Let P be Feller and let U C M be an open, accessible (i.e., Ra(x, U) > 0 for 
all x € M) small set. Show that R(U) is nonempty. 


166 8 Harris Ergodic Theorem 


(b) Construct a Feller Markov chain having an open recurrent set U for which 
R(U) = Ø. Hint: Let {®;} be the flow on S! = R/2zZ induced by the 
differential equation 6 — sin? (0 /2). Consider the deterministic chain defined as 
X; = o, (x). One can show that every proper neighborhood U of 0 is recurrent, 
but R(U) = Ø. 


Let x* € M be an accessible Doeblin point for P Feller. We say that x* is 
aperiodic if it has a neighboring small set U which is aperiodic. Observe that if U is 
a neighboring small set of x* such that £(U) > 0 (where & stands for the minorizing 
measure of U) then x* is aperiodic. 


Proposition 8.10 Assume P is Feller. Let x* € M be an accessible and aperiodic 
Doeblin point and let C C M be a compact set. Then there exists m > 1 such that 
C is a small set for P", 


Proof Let U be an open neighboring small set of x* with R(U) aperiodic. Then, 
by aperiodicity, there exists ng € N such that k € R(U) for all k > no (see 
Proposition 2.21). 

For ô > Oandk e N* let O(8,k) = (x € M : P*(x,U) > 6]. By Feller 
continuity and the Portmanteau theorem 4.1, O(6, k) is an open set. Since x* is 
accessible, the family {0 (8, k) : 6 > 0, k € N*} covers M. Thus, by compactness, 
there exist ô > 0 and integers kı, ..., kn such that C C U?_,O(6,k;). For x € 
O (6, ki) and k > no, 


pute .) > Í P“ (x, dy) P*(y, .) 


> Í P“ (x, dy) P! (y, UEC) = 8 inf. P^! (y, UEC. 
U yeU 


Here & stands for the minorizing measure of U. Thus, for m = max{kj,...,kn}+ 
no + 1 and some ô’ > 0, 


inf P" (x, .) > SEC). 
xeC 


Oo 


Theorem 8.7 and Proposition 8.10 imply the following useful result for Feller 
chains on compact sets. 


Corollary 8.11 Assume P is Feller on M compact and that there exists an 
accessible and aperiodic Doeblin point. Then the conclusion of Theorem 8.7 holds. 


When M is not compact, the assumption (made in Theorem 8.7 or used in 
Corollary 8.11) that the whole space is a small set is usually not satisfied. A 
sufficient condition ensuring geometric convergence is the existence of a small set 
and a Lyapunov function forcing the system to enter this small set. A classical proof 
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relying on coupling and renewal properties will be given in the next section. Hairer 
and Mattingly in [36] gave an alternative beautiful proof based on the construction 
of a suitable semi-norm making P a strict contraction. This proof is given below. 


Theorem 8.12 (Harris, Hairer and Mattingly) Assume that there exist: 


(a) A measurable map V : M > R+,0 < p < 1, and x > 0 such that 
PV < pV +k; 

(b) A probability measure w on M and 0 < e < 1 such that 
P(x,-) = ep) 


forallx € Vp :={x € M : V(x) x R} and R > 2x/(1 — p). 


Then there exist a unique invariant probability measure x for P and constants 


0 < y «LC > 0 such that for all f : M — R measurable with ||f||y :— 


SUD.c M UA « oo, 


IP” f(x) - xC£) € Cy" + VG))IL flv 


for all x € M and n € N*. 


Proof For B > 0 and f : M — R measurable, possibly unbounded, let 


E IFŒ@- O. 
Iri = sop) ey 


We claim that for some 1 > 8 > OandO < y < 1, 
Iflg x 1 IPfllg <y. (8.4) 


Assume the claim is proved. Observe that || fli < Iflg < gilli xz zllfilv- 
Then 


IP” fli < IP” flg x v"llfllg S v" 8 fiv- 


Equivalently 


IP” f(x) - Pf) x y"8 !flv2-2-VG) VQ). x,y eM. 


Thus, 


[P"f(x)-mf| < I IP” f(x) — P^ f(x (dy) < y" B! lf llv Qo VG) + rV), 
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where x is some (hence unique) invariant probability measure (see Exercise 8.13). 
This proves the result. 

We now prove the claim. Let f be such that || f |g < 1 and let x, y € M. Suppose 
first that V (x) + V(y) > R. Then 


|Pf(x) - Pf(y)| = | [rey — f(v))dx P(du)8, P(dv)| 
= J If) — f ()]àx P(du)à, P(dv) < 2+ BPV(x) + BPV(y) 


€ 2- 2k + pB(V(x) + Vi) S 12+ BO (x) + VO). 


where 


_ BQOx + pR) +2 


BRA € (0, 1). 


yi 
The last inequality follows from the fact that for all o,r > O anda > 2p, 
t>r>atpt<n2+n, 


where y; is the solution to a + pr = y1(2 + r). It suffices to seta = 2+ 28x and 
r= BR. 

Suppose now that V (x) + V (y) < R. In particular, x, y € Vg. As in the proof of 
Theorem 8.7, write Pf = (1 — e)Kf +ew(f), where, forall x € Vs, K(x,-)isa 
Markov operator. Thus 


|[Pf(x) - Pf(y)| = Q—e)KfG) - Kfi)| x (7-8 BOCV(G) -- KV(y))). 
Also, (1 — €)K V(x) = PV(x) — ewV < pV(x) +x. Thus 
[Pf (x) - PFO) x 20 — e) - 28k + oB(V (x) -V(Q)) x yo B( (x) V) 


with y2 = max(p, 1 — £ + Bx). Finally it suffices to choose Bk < € and to set 
y = max(yi, y2). n 


Exercise 8.13 


(i) Suppose that M is a Polish space, P is Feller, and that there exists a proper 
and continuous map V : M — R, satisfying assumption (a) of Theorem 8.12. 
Show that the set Inv( P) is nonempty. Hint: Use Corollary 4.23. 
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(ii) Suppose only that M is a measurable space. Show that Py (M) = {u € P(M): 
VeL! (u)} is complete for the distance 


|u — vlg := sup{|uf — vf| : f: M — R measurable, ||fllg 1j. 


Deduce that, under the assumptions of Theorem 8.12, there exists a unique 
invariant probability measure for P. Hint: Use Inequality (8.4) to show that 


|uP —vPlg € y|u — vip (8.5) 


for some 0 < y < 1 and £ > 0. 


Corollary 8.14 Suppose P is Feller and that there exists a proper map V : M > 
R+ satisfying assumption (a) of Theorem 8.12. Suppose furthermore that there 
exists an accessible aperiodic Doeblin point. Then the conclusion of Theorem 8.12 
holds true. 


Proof Choose R > dom The set C = (V x R} is a compact set (because V is 
proper) and small for some P” by Proposition 8.10. Since P" Vy < o" y + Tr 


Theorem 8.12 applies to P" and the result follows. Oo 


6.2.2 Continuous Time: Exponential Convergence 


For a weak Feller continuous-time Markov process {P;}:>0, aperiodicity is not an 
issue. Indeed, if a point p € M is accessible for ( P;};>0 and is a Doeblin point for 
some Pr, then p is necessarily aperiodic for Pz. This is a direct consequence of 
Lemma 6.5. Thus, the continuous-time version of Corollary 8.14 reads as follows: 


Theorem 8.15 Let { P;}:>0 be a continuous-time weak Feller semigroup. Assume in 
addition the following: 


(i) There exists a point p € M which is accessible for {P;}t>0 and which is a 
Doeblin point for some Pr, with To > 0; 

(ii) There exist a proper map V : M > R+,0< p < 1,x > 0, and T, > 0 such 
that 


PV x pV +k. 


Then there exist a unique invariant probability measure x for { P:};>0 and constants 
a > 0,C > Osuch that for all f : M — R measurable, 


IP f(x) —m(f)| € Ce " Qo V@))IIfllv 


for all x € M and t > 0. 
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Proof Relying on Proposition 6.3, one can find a point q and a time T = mT, (with 
m € N* sufficiently large) such that q is an accessible Doeblin point for Pr. a 
Lemma 6.5, it is also aperiodic for Pr. By assumption (ii), PrV < p% + ZS 
Thus, by Corollary 8.14, there exist constants 0 < y < 1 and C > 0 such that for 
alln e NandO <r <T, 


|Par+r f(x) — nCF)] = | Pr Pr f(x) — (Pr f| € Cy" + V@))I fllv. 


Thus 
C 
IP. f(x) - n(f)) < a + VGQ»Ifllv 


for all x € M and t > 0. oO 


Example 8.16 (Piecewise Deterministic Markov Processes) Consider the piecewise 
deterministic Markov process defined in Sect.6.4. Suppose that there exist an 
accessible point at which the strong bracket condition holds and a Lyapunov 
function as in Exercise 7.24. Then the conclusions of Theorem 8.15 hold. 


Example 8.17 (Stochastic Differential Equations) Consider the stochastic differen- 
tial equation introduced in Sect. 6.5. Suppose that there exist an accessible point at 
which the Hérmander condition holds and a Lyapunov function as in Lemma 7.26. 
Then the conclusions of Theorem 8.15 hold. 


6.2.3 Coupling, Splitting, and Polynomial Convergence 


This section is the natural counterpart of Sect. 2.7 on countable chains. It revisits the 
convergence theorems of the previous section and relates the rate of convergence to 
the moments of the return time to a recurrent small set. 


Theorem 8.18 Let C C M be an aperiodic, recurrent small set for P. 


(i) Ifsup,cc Ex(tc) < oo, then P is positive recurrent and, letting xt denote its 
invariant probability measure, 


lim |wP” —z|=0 
n—oo 


for every u € P(M). 
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(ii) If sup, ec Ex (È) < oo for some p > 2, then there exists c > 0 such that for 
every u € P(M) and for every n € N*, 


|uP" —z| < c1 4 E, (c£ ))). 


np-1 


(iii) Jfsup,cc Ex (eT) < oo for some Xo > Q, then there exist 0 < X < Xo and 
c > 0 such that for every u € P(M) and for every n € N*, 


| P" — n| x e^" c(1 E, (ec). 


Proof Positive recurrence follows from Theorem 7.18. The rest of the proof relies 
on à coupling argument that goes back to Harris [37] and Nummelin [52]. Let C be 
an aperiodic recurrent set for P. We proceed in two steps. 


Step 1 We first assume that C is an atom, meaning that there exists a probability 
measure € on M such that for all x € C, P(x,-) = &(-). In this situation the proof 
is very much like the proof given for a countable Markov chain (Theorem 2.35). 
Let (Xn) and (Y,;) be two independent chains (induced by P), Pg, the law of 
((Xn, Yn))n>o when (Xo, Yo) has law u & v, and let 


tcxc = min(n > 1: X, € C, Y, € Ch. 
Since C is an atom, for all u, v € P(M) and n € N*, 
Puev(Xn € 5 exc < n) = Pugy(Ya € 5; tcxc <n). 
Hence 
|, P" — n| = |u P" — n P"| € Pugna (tcxc = n), (8.6) 


where zr is the unique invariant probability measure of P. Let now ce ) (respec- 


(n) 
C 


tively (7. ^)) denote the successive hitting times of C by (X,,) (respectively (Y;)). 


The assumption that C is an aperiodic atom makes the processes T := (C A 


and T := alee e two aperiodic independent renewal processes (see Sect. 2.6) 
and tcxc is their first common renewal time. The additional assumption that 
sup,cc Ex(tc) < oo makes these processes L! (as defined in Sect. 2.6) so that 
TtCxC < oo almost surely (see Equation (2.5) and the discussion preceding it). 
Together with (8.6), this proves the first assertion. To prove the second assertion, 
observe that by (8.6), Markov's inequality, and Theorem 2.33, one has for all 
0 <q < p that 


1 1 A 
|] P" — x| < nü Tuon (Tec) = ras T (te) sts is (TE)). 
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The problem then reduces to estimating Er Gas Here again, the assumption that C 
is an atom will prove to be very useful. As for countable Markov chains, z can be 
explicitly written as 


ix X sina Xr 
n(f) = et = (OE(F X) +... + fX) 


for any x € C and all f > O measurable. The proof is similar to the proof of 
assertion (iii) in Theorem 2.6 (compare to Exercise 4.24) and left to the reader. 
Applying this formula to the map y +> E,(¥(tc)) for some nonnegative function 
V leads to 


tc-1 


On Qr (c)) = (OE (X Wik), 


k=0 


for all x € C, exactly as in Proposition 2.10. In particular 


an (T) < x (OE«GÉ ') 


for all x e C. With q = p — 1, this estimate yields 


in (te!) € (C) sup Ex (TË) < oo, 
xeC 


which concludes the proof of the second assertion. 
The proof of the third assertion is similar. By Markov's inequality and Theo- 
rem 2.34 there exists 0 < A < Ao such that 


|u P” — n| < e "E, os (e^ €*€) « e^" c(1-- E, (70) + E, (e*)). 


And for all x € C, 


ec _ | 


ero — 1 ^ 


on (eC) = n (C)Ex( 


Step 2 We suppose now that C is a small set with minorizing measure £. Let £ = 
E(M) < 1, 4y()= £O. and let K be the kernel on C defined by 


Pœ) -8640 


K(x, -) = m 


The idea of the splitting method consists in constructing a Markov chain (X,,) with 
kernel P with the help of an auxiliary sequence (/,), In € (0, 1). If X, Z C, then Zn 
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is set to 0. If X, € C, I, is randomly chosen according to a Bernoulli distribution 
with parameter £. At the next step, Xn+1 is distributed according to 


P(Xn, Lix emo +I — I)KGG, ) + Inv O(x,ec). 
More formally, consider the Markov kernel Q defined on 
M = {(x,i) € M x {0,1} :x gC >i=0]} 
as follows: For all x € M \ C, 
Q(x,0; dy x (0)) = P(x, dy)(1 — elc(y)), 
Q(x, 0; dy x {1}) = P(x, dy)elc(y), 
and for all x € C, 


Q(x, 0; dy x (0) = K(x, dy) — elc(y)), 

Q(x, 0; dy x {1}) = K(x, dy)elc(y), 

Q(x, 1; dy x {0}) = v(dy)(1 — elc(y)), 

Q(x, 1; dy x {1}) = wdy)elc(y). 
We let (Xn, In) denote the canonical process on (Q, F) = (MN, B(M)®9), Fa = 
o ((Xi, Ii)i<n), and for each v € P(M), P, the Markov measure on Q making 
(Xn, In) a Markov chain with kernel Q (with respect to (F;,)) and initial law v. As 
usual we write IP, ; for Psy i: We shall also use the following convenient notation: 

Py :=Pyoifx € M \C, 
D. := (1 — )Px,o + £P, 1 if x € C. 


Let Gn = o((Xi)i<n). It is not hard to verify (but still a good and recommended 
exercise) that 


Py(Xn+1 € :[65) = P(Xn, -) 
for all n > 1 and v € P(M), and that 
Py(X1 € +) = PG, ). 


This shows that (X,),»0 is a Markov chain with kernel P and initial value Xo = x 
on (Q, F; (Gn), Py). 
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We claim that: 


(a) C x {1} is a recurrent aperiodic atom for Q; 
(b) If, for some p > 1, sup, cc Ex (TE) < oo, then there exist a, b > O such that 
for all (x, i) € M 


(r5 <a Dx (TE) tb; 


(c) If, for some Ap > 0, sup, cc Ex (e0TC) < oo, then there exist a > 0 and 
0 < A < Ao such that for all (x, i) € M 


ji (e 00) < aE, (ec), 


Assume the claims are proved. Then, by step 1, (Xn, In) is positive recurrent, 
and so is (X4). As n — oo, the sequence of probability measures P”(x,-) = 
P. (X, € -) converges in total variation toward x, the invariant probability measure 
of P. If supyec hn) < oo for some p > 2, then, by (b) in the claim, 
SUPyec Oe 1(TEx (1) < oo. Thus, by step 1, 


|P"(x, A) — a (A)| =|Px(Xn € A) — x (A)| 


"m GE ly l dak (cP!) +b) 
m xA Cx{1 = 9gp-1 TUE 


for every x € M and A € B(M). Thus, for every u € P(M), 


|LP" —1| 22 sup |uP"(A) - x(A)| < 
AcB(M) 


C + aE y (ee D) +b). 


This proves the second assertion. The proof of the third one is similar. 


We now prove the claims. Clearly C x {1} is an atom for Q. Identify C with the 
subset of M consisting of points (x, i) such that x € C. Under this identification, 
C x (1) C C and we rely on Proposition 7.16 to prove the claim. By the assumption 
that C is recurrent for P, for all x € M, 


(1— 8)Px,o(tc < oo) + Px, (tc <œ) ifxeC, 


1=P,(tc < %)= i 
Px,o(tc < oo) ifx e MAC. 


Thus, for all (x, i) € M, Py (tc < oo) = 1, showing that C is recurrent for Q. 
Also, 


Py i (Xr, Irc) eCx(1) =e 
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because P, ;((X;., Ire) € C x {1}|Gr-) = £. Thus, by Proposition 7.16 (i), C x {1} 
is recurrent for Q. We now prove that it is aperiodic. For x € C, j,k > 1, 


Py 1(Xj44 € C, Ute = 1) = Pe (Xe € C) > Ex, raj P* (Xe, C) 


> ePy1(tc = j) inf P*(x, C). 
xeC 


Since C x {1} is an atom, P, ;(rc = j) does not depend on x € C and is > 0 for 
some j = jo > 1. By aperiodicity of C for P, there exists no € N such that for all 
k>no 


inf P(x, C) > 0. 
xeEC 


Therefore infyec Px,ı(Xg € C, Ik = 1) > O for all kK > no + jo. This proves 
aperiodicity and concludes the proof of claim (a). 

If sup, ec bx (te) < oo for some p > 1, then sup, ec La CTE) < œ fori € 
(0, 1}, and by Proposition 7.16 (ii), sup, ec Oxi (TEx) < co. Now 


tCx(1) S tc + tcx(1) 9 Orc; 


so that 


Oxi (TE (1)) < anie iln) + sup Tx i TEx) =a oxi (te) +b. 
x€C,i-0,1 


Claim (c) is proved similarly. o 


Remark 8.19 Itis interesting to compare Theorems 8.12 and 8.18 (iii). Under the 


assumptions of Theorem 8.12, the set C = (V x R} with R > 5 satisfies 


condition (iii) of Theorem 8.18 (with àọ = +). This follows from Proposi- 
tion 7.12 (iii) or Proposition 7.14 (choose $(s) = Aos). Then, by Theorem 8.18, 
|P" f(x) 2 n(f)| € e*"e( + V(x))|| flloo for all f € B(M). Observe however 
that the conclusion of Theorem 8.12 is stronger, in the sense that it allows to deal 


with functions that are unbounded but majorized by 1 -- V times a constant. 


8.3 Convergence in Wasserstein Distance 


Let H be a separable real Hilbert space with norm || - || and let P be a Markov 
kernel on (H, B(H)). Let F(A) be the space of bounded functions f : H — R 
with bounded and continuous Fréchet derivative, as defined in Sect. 5.3.2. Recall 
from Sect. 5.3 that for a bounded metric d on H, Lip, (d) denotes the set of Borel 
measurable functions $ : H — R such that |ø (x) — $(y)| x d(x, y) for every 
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x, y € H. Also recall that 


llu — vlla = sup (ud — vo), U,V € P(H). 
$eLip; (d) 


If (H, d) is Polish, then 


llu — vlla = Wı (u, v) := inf | d(x, y) T (dx, dy) 
leC(u,v) J H2 


and the metric W; on P(H) is called the Wasserstein distance of order 1 (or simply 
Wasserstein distance) corresponding to d, see Remark 5.36. 

The following theorem provides conditions under which the mapping u œ> uP 
is a strong contraction in a certain Wasserstein distance. It is a discrete-time version 
of Theorem 2.5 in [35], which was formulated for a continuous-time Markov 
semigroup. 


Theorem 8.20 (Hairer, Mattingly) Assume that there exist constants a. € (0, 1) 
and C > 0 such that for every f € F(H), one has Pf € F(H) and 


IIVPflloo < Cll flloo + «lV f lloc. (8.7) 
Define 
l-a B= {( ) H?: < 2} 
Y= TE = 7, y) € ix — yl <yv/27, 
and assume that 
a:= inf sup[L(B):T € C(9; P, 8, P)] > 0. (8.8) 
x,yeH 


Then there exists B € (0, 1) such that 
lu P —vPlla x Plu — vla, Vu,v e P(A) 


for the bounded metric d(x, y) := 1^ (y7! ||x— y||). One can choose B = max{(1+ 
a)/2,1— 5} 


Notice that the condition in (8.7) implies that P is asymptotically strong Feller 
(see Theorem 5.30). The condition in (8.8) is relatively strong but, as we shall see, 
allows for a short and transparent proof. In [35], Hairer and Mattingly also formulate 
a set of Lyapunov-type conditions which imply that u +> uP is a strong contraction 
in the Wasserstein distance corresponding to the metric 
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1 
d(x, y) = inf f V (v Gl G)Il ds, 


where V is a suitable Lyapunov function and the infimum is taken over absolutely 
continuous paths y : [0,1] — H such that y (0) = x and y(1) = y. The latter 
result is more broadly applicable, in particular to the two-dimensional stochastic 
Navier-Stokes equation. 


Proof (Theorem 8.20) We first show that there exists 6B € (0, 1) such that 
lêr P —8,Plla x Bd(x,y), Vx,y €H. (8.9) 


Let $ € Lip; (d). By Remark 5.30, there exists a sequence (@n)n>1 in F(H) N 
Lip, (d) such that 


lim n(x) = (x), Wx € H. 


Define $ and ($n)n>1 as in the proof of Theorem 5.29. Then n € F(H) ^ Lip, (d) 
and ||Vdnlloo < y | for every n € N*. By assumption, for every n € N*, one has 
Pon € F(A) and 


E E à 2aC 
IV Pdulloo < Cllóslloo + «lI Vónlloo X C+ Tx 


As shown in the proof of Theorem 5.29, for n € N* and x, y € H, 


C(1 4- a) 


Pn) — Pony) < Ix — vll Pnl < lx — vll ———— 


Let x, y € H such that ||x — y|| < y. Then d(x, y) = y^! ||x — yll, so 


C(1 
Pon (x) — Pør) < d(x, yy 9 = d(x, y)(1 + 0)/2. 


By bounded convergence, 
P(x) — Poy) < dx, y)(1 + a)/2. 
As this estimate holds for all ọ € Lip, (d), 


|8, P — 8,Plla S d(x, y)(1 + 09/2. 
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Now let x, y € H such that ||x — y|| > y. Then there exists P € C(6,P, ôy P) such 
that 


l'(B) > a/2. 


The space H with the norm-induced metric is Polish, and d is a bounded continuous 
metric on H. By Kantorovich—Rubinstein duality (Theorem 5.34), 


6, P —ó,Plla = inf | d(a, b) V (da, db) 
leC(à P,8,P) J H2 
<f d(a, b) Č (da, db) 
H? 
=f d(a, b) F (da, db) «f d(a, b) T (da, db). 
B H?\B 
For (a, b) € B, one has 
d(a,b) < y la — b|| < 1/2. 
Hence 
m hs 
f d(a,b) T (da, db) < 31 0». 
B 
And since the metric d is bounded by 1, 
i d(a, b) (da, db) x F(H?N B) 2 1 — F(B). 
H?\B 
As aresult, 
l~ a 
lôx P — ôyP lla < 1 — 3H UD <1- 3 
Our assumption that ||x — y|| > y implies that d(x, y) = 1. Hence 
a 
Jà.P — Pla x d(x, (1-7). 


2 


This proves (8.9) for 8. = max{(1 + @)/2; 1 — 5} To complete the proof of 
Theorem 8.20, let u, v € P(H). By Theorem 5.34, there exists [* € C(u, v) such 
that 


u-v = f d(x, y) *(dx, dy). 
H? 
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Let ¢ € Lip; (d). Then, by Exercise 5.38, 


(4 P)ó — (vP)o = [eme — (ôy P)9) T* (dx, dy). (8.10) 
Forx,y € H, 


(8, P) — (8, P)O < |. P — ôyP lla < Bd(x, y). 


Hence, the right-hand side of (8.10) is dominated by 


ef d(x, y) *(dx, dy) = Blla — vli. 
H? 


Taking the supremum for the left-hand side of (8.10) over all € Lip, (d) yields the 
desired contraction estimate. o 


Corollary 8.21 Under the assumptions of Theorem 8.20, the Markov kernel P 
admits a unique invariant probability measure x and there exists B. € (0, 1) such 
that for every u € P(H), 


luP” — zla < B" —zla, Yn eN*. 


Proof Clearly, d induces the same topology on H as the metric induced by || - ||. 
Then (H,d) is a Polish space with a bounded metric. By Remark 5.36, P(H) 
endowed with the metric (u, v) œ> ||“ — v|a is Polish. Since u — uP is a strong 
contraction on this complete metric space, the Banach fixed point theorem yields 
existence and uniqueness of the invariant probability measure z. For u € P(H) 
and n € N*, one has 


li P^ — lla = li P^ — x P" lla x Blu — 7 as 


where £ is the constant from Theorem 8.20. oO 


Appendix A 
Monotone Class and Martingales 


A.1 Monotone Class Theorem 


A set H C B(M) is said to be stable by bounded monotone convergence if f, € H 
and0 < f, < fn+i € 1 implies that f = lim, f, € H. 


Theorem A.1 (Monotone Class Theorem) Let H C B(M) be a vector space 
of bounded functions containing the constant functions and stable by bounded 
monotone convergence. Let C C B(M) be a set stable by multiplication and let 
c (C) denote the o-field generated by C (i.e., the smallest o -field for which the 
elements of C are measurable). If C C H, then H contains every bounded o (C)- 
measurable function. 


A.2 Conditional Expectation 


We recall here the definition of conditional expectation and give some of its basic 
properties. More details and proofs can be found in standard textbooks such as [7]. 

Let (Q, F, P) be a probability space and let B be a o-field contained in F. Let 
X be a real-valued random variable such that E(|X|) < oo. Then there exists a 
real-valued random variable Z with E(|Z|) < oo such that 


(i) Z is B—measurable; 
(ii) For all A € B, we have 


E(Z14) = E(X14). 
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The random variable Z is unique in the following sense: If Z’ is any other random 
variable satisfying E(|Z’|) « oo and the conditions in (i) and (ii), then P(Z' = 
Z) = 1. In other words, the space of equivalence classes L! (Q, B, P) has a unique 
element Z satisfying the condition in (ii). This element of L! (2, B, P) is called 
the conditional expectation of X given B, and is denoted by E(X|B). If we write 
Y = E(X|B) for some 6-measurable random variable Y, we mean that Y is a 
representative of the equivalence class E(X|B). 

One can also define conditional expectation for nonnegative random variables: 
Let X : Q — [0, oo] be measurable, i.e., (o € Q : X(w) € A} € F for every set 
A C [0, oo] such that A \ {oo} is a Borel subset of [0, co). For every n € N, let 
Xn :— X Anand let Z, be a B-measurable random variable such that E(|Z,]) < oo 
and E(Z,14) = E(X,14) for every A € B. By changing the values of (Z,) on a 
set of measure 0 if necessary, one can assume that (Z;(@))nen is nondecreasing for 
every o € Q. The function 


Z(w) = lim Zn(@) 


then maps from 2 to [0, co] and satisfies the conditions in (i) and (ii). If Z’ : Q > 
[0, co] is any other random variable satisfying (i) and (ii), then P(Z = Z’) = 1. 
On the set of B-measurable functions from Q to [0, co], consider the equivalence 
relation given by equality P-almost surely. The conditional expectation of X given 
B, denoted by E(X|5), is defined as the unique equivalence class that satisfies (ii). 


Theorem A.2 (Properties of Conditional Expectation) Let X be a random vari- 
able, with E(|X|) < oo or X € [0, œ], and let B be a o-field contained in F. 
Then 


(i) E(E(X|B)) = E(X); 
(ii) IF E(|X|) < oo (resp. X € [0, œ]), we have for every B-measurable random 
variable Y with E(|XY |) < oo (resp. Y € [0, oo]) 
E(XY|B) = YE(X|B), 


with the convention that 0 - oo = 0; 
(iii) For every o-field A contained in B, we have 


E(E(X|B)|A) = E(X|.A). 


This is often called tower property. 
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A.3 Martingales 


Here, we recall the few results from martingale theory that are used in this course. As 
for conditional expectation, there are many introductory texts on probability theory 
that provide more details and proofs, e.g., [69] or [7]. 

Let (Q, .F, F, P) be a filtered probability space. We let Fœ denote the o-field 
generated by Un>0 Fn. A sequence (M, ) of adapted (i.e., M; is ^; -measurable) and 
L! real-valued random variables is called a martingale (respectively a submartin- 
gale, respectively a supermartingale) if 


E(Mn+il|Fn) = M, resp. >, resp. < 
for alln > 0. 


A simple, but useful consequence of Jensen’s inequality is the following result. 


Proposition A.3 Let (M,,) be a martingale (resp. a submartingale) and $ a convex 
function (resp. a convex nondecreasing function) such that 6(M;,) € Ll. Then 
(6 (M,)) is a submartingale. 


It is often useful to extend the martingale (submartingale, supermartingale) 
property to stopping times. Doob's optional stopping theorem shows that this can 
be done for bounded stopping times. 


Theorem A.4 (Optional Stopping) Let M = (M,) be a martingale (resp. 
submartingale, supermartingale ). 


(i) If T is a stopping time, then (MaAT)n»0 is a martingale (resp. submartingale, 
supermartingale); 
(ii) If S < T are stopping times bounded by some constant N, then 
E(M7|Fs) = Ms, resp. >, resp. <. 
Proof 
(i) Foralln € N 


Moasyat — Maat = (Masi — Mn) liT >n}. 


Taking the conditional expectation with respect to F, proves the result. 
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(ii) Assume (M,) is a martingale. Proving that E(Mr|Fs) = Msg amounts 
to proving that for all A € Fs and0 < k < N, E(Mrlan(s-a) = 
E(Milan(s-4)). One has 


N N 
E(Mrlan(s-k) = b» E(Milir-ijlan(s-u) = 5 E(E(My|Fi)liT=i} Larisa) 
i=k izk 
N 
= OS EE My Arai lans=x lF) = E(MNlants-g) 
i=k 


= E(E(My Lants=|Fx)) = EWM Larys=a)- 
The proof for sub- and supermartingales is similar. 
oO 


Corollary A.5 (Doob’s Inequality) Let (Xn) be a nonnegative submartingale. 
Then, for alla > 0, 


E(X 
P( sup Xj >a) < En) 
0<i<N 


Proof Let T = min(i > 0 : X; > a}. Then T A N is a stopping time bounded by 
N so that, by the optional stopping theorem, 


E(Xw) > E(XyAr) = E(Xyn1r>n) + E(Xrir<n) > «P(T < N). 


The two following theorems are classical convergence results due to Doob. 
Theorem A.6 Let (M,) be a submartingale. Assume that sup, E(M;) < co. Then 
there exists Moy € L! such that M, — Moo almost surely. 

Theorem A.7 Let (M,,) be a martingale. Then the following assertions are equiv- 


alent: 


(a) (M,) is uniformly integrable; 
(b) (Mn) converges almost surely and in L! to some random variable Ma; 
(c) M, = E(M|F,,) for some M € L!. 


Furthermore, in case (c), lim, o5 M, = Mx = E(M| Fæ). 
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Let (Mn) be an L?-martingale (i.e, M, € L?). The predictable quadratic 
variation of (M,) is the process ((M),,) recursively defined as 


(M)o — 0, (M)n+1 — (M). = E((Mn+1 — Mn)’ |Fn) = EM) — M;. 


Note that ((M),,) is nondecreasing, predictable (1.e., M, is ^5. ;-measurable) and 
that (M2 — (M)n)n is a zero-mean martingale. We let (M)o5 = lim; o5 (M)s. 


Theorem A.8 (Strong Law of Large Numbers) Let (M,) be an L?-martingale. 
Then 


() FEUM») = psy EC Mes — My?) < oo, then (My) converges almost 
surely and in L? to some random variable Moo; 
(ii) On (Moo) < co, (Mn) converges almost surely to some finite random variable 
Moo; 
Gii) On (M) oo = oo, lim, oc üt = 0a.s. 


(iv) If sup, E22) < oc, then lim, >œ — 0 a.s. 


Proof We only prove the last statement, which allows for a short proof and which 
is all that is needed in this book. By Doob’s inequality, for all n € N, 


M, 1 
P( sup DE se) < PC sup IMi = 222” < 


1 
E((M)on) < C—— 
2n ek «2n k<2n+! g222n (( )2 ) = e 


and the result follows from the Borel-Cantelli lemma. oO 
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